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Abstract 

This  thesis  determines  whether  an  artificial  neural  network  (ANN)  can  ap¬ 
proximate  the  Armstrong  Aerospace  Medical  Research  Laboratories  (AAMRL)  head 
related  transfer  functions  (HRTF)  data  obtained  from  research  at  AAMRL  during 
the  fall  of  1988.  In  order  to  test  this  hypothesis,  two  separate  tests  are  performed. 

The  first  test  determines  whether  HRTF  lends  any  support  in  sound  localiza¬ 
tion  when  compared  to  no  HRTF  (Interaural  Time  Delay  only).  Depending  on  the 
statistical  method  used,  we  can  conclude  that  the  HRTF  does  provide  an  advantage 
in  localization  accuracy  over  simple  ITD.  There  is,  however,  a  statistically  significant 
interaction  between  the  location  of  the  sound  and  whether  the  HRTF  or  no  HRTF 
is  used.  When  this  interaction  is  removed  using  the  alternate  F- Value,  the  statistics 
give  the  result  of  equal  means  for  the  filters  and  azimuth.  This  means  that  at  certain 
angles  of  azimuth,  the  HRTF  either  provides  no  advantage  at  all  or  hinders  local¬ 
ization  capabilities.  Adding  in  the  corrections  for  reversals  changes  the  results  to 
where  the  means  of  the  azimuth  are  not  statistically  equal.  The  reversal  corrections 
will  inherently  reduce  the  error  results.  However,  comparing  the  number  of  reversals 
indicates  an  advantage  of  using  the  HRTF  over  no  HRTF. 

The  second  test  determines  whether  HRTF  and  ANN  lend  the  same  amount  of 
imformation  in  sound  localization  when  compared  to  each  other.  Without  consider¬ 
ing  reversals,  we  can  conclude  that  the  RBF  HRTF  provides  a  statistical  advantage 
in  localization  accuracy  over  the  AAMRL  HRTF  from  which  they  were  derived. 
However,  with  reversal  corrections  included,  an  advantage  is  not  indicated,  and  two 
filters  are  statistically  equal.  Also  with  reversal  corrections  included,  there  is  a  sta¬ 
tistically  significant  interaction  between  the  location  of  the  sound  and  whether  the 
AAMRL  HRTF  or  ANN  HRTF  is  used.  As  discussed  in  the  first  experiment,  this 
means  that  at  certain  angles  of  azimuth  the  HRTF  either  provides  no  advantage  at 


xm 


all  or  hinders  localization  capabilities.  However  comparing  the  number  of  reversals 
does  not  indicate  a  large  difference  between  the  AAMRL  HRTF  and  the  RBF  HRTF. 

The  final  conclusion  is  that  the  AAMRL  HRTFs  can  be  approximated  by  an 
artificial  neural  network  for  azimuth  positions  at  zero  degrees  elevation  and  still  have 
good  results. 
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Head  Related  Transfer  Function  Approximation  Using  Neural 

Networks 


I.  Introduction 

1.1  Background 

The  visualization  of  multivariate  data  is  a  complex  area.  There  are  a  number  of 
ways  to  help  view  the  data.  The  conventional  approach  for  the  presentation  of  data 
is  to  present  it  in  tabular  or  graphical  form.  Common  graphical  techniques  work  fine 
for  three  dimensions  or  less.  For  example,  figure  1  shows  some  typical  presentation 
forms  of  data.  In  each  case,  the  presentation  utilizes  only  the  human  sense  of  sight. 

3D  plot  Sphere 


Figure  1.  Typical  presentation  forms  of  data 


The  other  four  senses  (auditory,  touch,  taste  and  smell)  are  not  exploited.  The 


1 


presentations  in  figure  1,  as  depicted,  sound  the  same.  Except  for  the  thickness  of 
the  ink  on  the  page,  they  feel  the  same.  They  also  taste  the  same  and  smell  the  same. 
Of  these  four  unused  senses,  the  auditory  sense  has  the  most  practical  potential  (6). 
For  example,  in  a  study  of  sleeping  subjects,  the  subjects  responded  faster  to  an 
auditory  alarm  than  to  the  presence  of  heat  or  the  smell  of  smoke  (10,  22).  The 
fastest,  simple  reaction  time  come  from  electric  shock  (130  ms),  followed  by  audio 
and  tactile  (140  ms)  and  finally  visual  (180  ms)  (22).  Sleeping  is  a  case  where  a 
visual  display  is  not  appropriate.  The  following  types  of  circumstances  are  possible 
cases  where  an  auditory  display  would  be  preferable  to  a  visual  display:  (22) 

•  When  the  origin  of  the  signal  is  itself  a  sound 

•  When  the  message  is  simple  and  short 

•  When  the  message  will  not  be  referred  to  later 

•  When  the  message  deals  with  events  in  time 

•  When  warnings  are  sent  or  when  the  message  calls  for  immediate  action 

•  When  continuously  changing  information  of  some  type  is  presented,  such  as 
aircraft,  radio  range,  or  flight-path  information 

•  When  the  visual  system  is  overburdened 

•  When  speech  channels  are  fully  employed  (in  which  case  auditory  signals  such 
as  tones  should  be  clearly  detectable  from  the  speech) 

•  When  illumination  limits  use  of  vision 

•  When  the  receiver  moves  from  one  place  to  another 

•  When  a  verbal  response  is  required 

Many  systems  make  use  of  different  audio  and  visual  displays.  Sound  displays 
are  not  limited  to  one  dimension  (monophonic)  or  two  dimensions  (stereophonic). 
Three  dimensional  (binaural)  sound  is  where  the  listener  hears  the  direction  and 
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distance  of  the  sound.  Begault  and  Wenzel  list  some  advantages  and  possible  appli¬ 
cation  of  binaural  systems:  (3) 

•  monitor  and  identify  sources  of  information  from  all  possible  locations 

•  improve  intelligibility  of  sources  in  noise 

•  enhance  segregation  of  multiple  sources  of  speech 

•  improve  air  traffic  control  displays 

•  applications  in  cockpit  communications 

•  telepresence  applications  such  as  teleconferencing 

•  shared  electronic  workspaces 

•  telerobotic  control 

A  potential  application  listed  above  is  in  the  cockpit  (3,  8,  22).  Although 
sound  is  already  an  important  tool  in  the  cockpit-only  ID  sound  is  used.  The  pilot 
perceives  the  sound  as  coming  from  his  or  her  earphones  or  from  inside  his  or  her 
head.  In  3D  sound,  the  pilot  perceives  the  sound  as  coming  not  from  the  earphones  or 
inside  the  head,  but  from  a  location  outside  the  head  at  a  certain  azimuth,  elevation 
and  distance.  By  using  3D  sound,  the  position  of  the  pilot’s  wingman  can  then  be 
encoded  into  three  dimensions  so  that  when  the  pilot  hears  the  wingman  speak  he 
or  she  perceives  the  wingman’s  voice  as  coming  from  the  wingman’s  relative  position 
to  that  of  the  pilot.  In  the  same  way,  targets  and  threats  can  be  presented  to  the 
pilot.  Active  and  non  active  threats  could  be  distinguished  by  different  pitches.  The 
distance  that  the  sound  is  perceived  from  the  head  could  indicate  the  urgency  of 
the  data.  Immediate  action  information  could  be  placed  close  to  the  head  while  less 
urgent  information  could  be  placed  farther  away.  Begault,  in  an  experiment  involving 
commercial  pilots  and  collision  warning  systems,  found  that  air  crews  presented  with 
3D  auditory  data  acquired  targets  approximately  2.2  seconds  faster  than  air  crews 
that  were  presented  only  ID  auditory  data  (2). 
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In  another  example,  data  from  an  Army  battlefield  simulation  was  encoded 
into  sound  creating  “battlefield  songs”  of  the  data  (6).  For  example,  in  a  song  with 
only  three  variables:  rhythm,  tempo  and  pitch;  rhythm  may  indicate  the  combatants 
of  the  battle,  tempo  may  indicate  the  number  of  troops  moving  toward  the  front  and 
pitch  may  indicate  the  number  of  troops  already  at  the  front.  Using  this  scenario, 
a  battlefield  song  of  one  rhythm  that  is  increasing  in  tempo  and  pitch  and  another 
rhythm  that  has  steady  tempo  and  pitch  would  indicate  one  combatant  of  the  battle 
had  continuous  reinforcements;  while  the  other  combatant  did  not  get  reinforcements 
and  was  losing  troops  at  the  front.  In  the  multidimensional  case,  listeners  had 
difficulty  hearing  the  two  sides  of  the  battle  independently;  however,  3D  sound  may 
help  separate  the  songs  better  by  separating  the  songs  to  different  locations  outside 
the  head  (3). 

1.2  Problem 

This  thesis  determines  whether  an  artificial  neural  network  (ANN)  can  approx¬ 
imate  the  Armstrong  Aerospace  Medical  Research  Laboratories  (AAMRL)  head  re¬ 
lated  transfer  functions  (HRTF)  data  obtained  from  research  at  AAMRL  during  the 
fall  of  1988  (12).  In  order  to  test  this  hypothesis,  two  separate  tests  are  performed. 
The  first  test  determines  whether  HRTF  lends  any  support  in  sound  localization 
when  compared  to  no  HRTF  (Interaural  Time  Delay  only).  The  second  test  deter¬ 
mines  whether  AAMRL  HRTF  and  ANN  HRTF  lend  the  same  amount  of  support 
in  sound  localization  when  compared  to  each  other. 

1.3  Definitions 

This  section  provides  definitions  of  key  terms  that  will  be  used  in  this  the¬ 
sis.  (25) 

Binaural  sound  is  sound  that  arises  from  two  separate  audio  signals-one  at 
each  ear.  In  a  natural  environment,  humans  hear  binaural  sound.  The  signal  that 
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arrives  at  the  left  ear  is  characteristically  different  from  the  signal  at  the  right  ear. 
The  brain  uses  the  differences  between  the  two  signals  to  determine  the  direction 
and  distance  of  the  source.  Binaural  sound  is  usually  recorded  with  a  manikin  as 
opposed  to  stereo  sounds  which  are  recorded  with  without  a  manikin.  The  left  ear 
signal  is  recorded  separately  from  the  the  right  ear  signal.  When  sound  is  played 
through  headphones,  the  sound  is  binaural  if  the  signal  sent  to  the  left  headphone  is 
separate  from  the  signal  sent  to  the  right  (23:3). 

Binaural  mixing  console  is  an  electronic  device  that  contains  filters  which  con¬ 
vert  a  monophonic  signal  to  a  binaural  signal  corresponding  to  a  given  distance  and 
direction  (15:208). 

Telepresence  is  a  human’s  perception  of  being  present  in  a  natural  environment 
while  actually  being  present  in  an  artificial  or  virtual  environment. 

Virtual  audio  is  synthetically  produced  audio  signals  which  enable  a  listener 
to  achieve  auditory  telepresence. 

Extracranialized  sound  is  sound  that  is  presented  to  a  listener  in  the  form  of  a 
binaural  signal  through  headphones  and  is  perceived  as  coming  from  some  distance 
from  the  listener  (i.e.  outside  the  head). 

Pinna(e)  is(are)  the  human  outer  ear(s).  The  design  of  each  person’s  pinnae 
is  unique  and  each  set  of  pinnae  will  uniquely  filter  sounds.  In  fact,  the  human  head 
and  ears  form  an  antenna  system  characteristic  to  the  individual  (24).  Experiments 
have  shown  that  spectral  shaping  by  the  pinnae  is  dependent  upon  direction  and  cues 
provided  by  the  pinnae.  These  are  critical  in  extracranializing  the  sound  (30:84). 

Head  related  transfer  function  (HRTF)  is  the  transfer  function  which  accounts 
for  the  filtering  effects  of  the  pinnae.  Because  the  filtering  of  sound  by  the  pinnae  is 
dependent  upon  direction,  there  is  a  different  HRTF  for  each  angle  of  azimuth  and 
elevation.  HRTF’s  will  differ  for  each  persons  pinnae;  however,  these  differences  are 


5 


relatively  small.  Accurate  localization  of  sound  sources  can  be  made  using  “someone 
else’s  pinnae”  by  presenting  sounds  filters  with  another  persons  HRTF’s  (31). 

Auditory  Localization  Cues  are  cues  that  change  the  sound  as  a  function  of 
sound  source  and  receiver  location.  There  are  four  primary  types  of  cues,  1)  Monaural 
Temporal  Cues,  2)  Monaural  Spectral  Cues,  3)  Binaural  Temporal  Cues,  and  4) 
Binaural  Spectral  Cues.  Spectral  cues  affect  the  spectrum  of  the  sound  that  reaches 
the  ear,  while  temporal  cues  affect  only  the  arrival  time  of  the  sound  signal  (31) 

1.4  Research  Objectives 

In  order  to  solve  the  stated  problem  the  following  research  objective  will  be 

met: 

•  Modify  AFIT  algorithms  that  have  been  used  successfully  on  placing  sounds  in 
azimuth  and  elevation.  The  algorithms  that  create  3D  sound  with  the  AAMRL 
HRTF  will  be  changed  to  use  the  ANN  HRTF  instead.  This  will  allow  the 
creation  of  tests  to  compare  the  ANN  HRTF  with  the  AAMRL  HRTF. 

•  Statistically  check  the  advantage  of  HRTF  and  ITD  versus  ITD  only.  The 
results  of  the  first  experiment  will  provide  a  significantly  large  data  base  to 
analyze. 

•  Statistically  check  the  difference  between  AAMRL  HRTF  data  and  ANN  HRTF 
data.  The  results  of  the  second  experiment  will  provide  a  significantly  large 
data  base  to  analyze. 

•  Investigate  the  program  LNKmap  as  a  neural  network  tool  for  function  ap¬ 
proximation. 

1.5  Scope 

This  thesis  will  process  and  analyze  experimental  data  to  obtain  results  of  the 
two  experiments.  The  differences  in  sound  localization  accuracies  from  the  HRTF 
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versus  no  HRTF  experiment  and  the  AAMRL  HRTF  versus  ANN  HRTF  will  both  be 
analyzed  by  the  analysis  of  variance  method  (14).  To  limit  the  amount  of  data  to  be 
processed  only  the  24  azimuth  location  at  zero  elevation  will  be  analyzed.  This  will 
facilitate  the  Neural  Network  learning  time  and  the  amount  of  data  to  be  analyzed. 


1.6  Approach 


1.6.1  Synthesizing  3D  Cues.  Humans  use  auditory  localization  cues  to 
spatially  localize  aound.  Three  primary  characterists  of  the  auditory  signals  are 
attenuation,  time  of  arrival  (TOA)  and  HRTF.  As  the  sound  travels  to  the  body, 
obstacles  such  as  walls,  furniture  or  even  air  attenuate  the  sound  source.  If  the  head 
is  not  directly  facing  the  sound  wave,  the  relative  distance  from  each  ear  to  the  sound 
source  will  be  different.  This  gives  the  brain  the  TOA  cue.  Once  the  sound  reaches 
the  body,  the  head,  the  torso,  the  nose  and  the  outer  ear  or  pinna  filter  the  sound 
wave.  This  filtering  function  is  known  as  the  HRTF.  The  HRTF  is  analogous  to  an 
analog  signal  processing  filter  (figure  2).  As  in  a  analog  filter  where  an  input  signal  is 


Input  signal 


Output  signal 


Figure  2.  Block  diagram  of  a  typical  signal  processing  transfer  function 


transformed  by  a  system  function,  H  to  an  output  signal,  the  HRTF  transformation 
of  the  sound  appears  to  facilitate  the  brain  in  localization  perception  (figure  3). 


The  factors  which  affect  the  HRTF  include  distance  (d),  elevation  (</>),  azimuth  ( 9 ), 


Input 

Sound  Source 


HRTF 

Head 

Torso 

Pinnae 


Output 

Inner  Ear  and  Brain 


Figure  3.  HRTF  transforms  sound  to  aid  the  brain  in  location  perception 
frequency  (w),  torso  and  head  size.  The  frequency  of  the  sound  plays  an  important 
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role  in  determining  the  HRTF.  Although  the  HRTF  varies  for  frequency  and  for  each 
point  in  space,  the  torso  and  head  size  can  be  generalized  from  person  to  person  (3). 
For  example,  the  HRTF  for  a  sound  located  at  30°  elevation,  20°  azimuth,  10  feet 
and  440  Hz  will  be  different  from  the  HRTF  for  a  sound  located  at  30°  elevation, 
20°  azimuth,  10  feet  and  1000  Hz.  However,  the  sound  transformed  by  the  same 
HRTF  will  generally  be  perceived  as  coming  from  the  same  direction  by  people  with 
varying  size  heads  and  torsos. 

The  elevation  ((f))  of  the  sound  is  measured  from  the  ground  plane  through  the 
ears  to  a  sound  source  (figure  4).  The  azimuth  (0)  of  the  sound  is  measured  from 


Figure  4.  Elevation,  (f)  measured  from  the  ground  plane  to  a  sound  source 

directly  in  front  of  the  face  counter-clockwise  to  the  sound  source  (figure  5).  The 
distance  ( d )  of  the  sound  is  measured  from  the  center  of  the  head  to  the  sound  source 
(figure  6). 

All  of  these  parameters  can  be  measured  in  a  sphere  (figure  7)  to  determine 
the  HRTF  for  any  location  and  frequency.  To  measure  the  AAMRL  HRTF,  an 
anatomically  correct  manikin  bust  is  positioned  at  the  center  of  the  anechoic  chamber 
sphere  with  microphones  placed  inside  the  ear  canals.  Pure  sine  waves  produced  by 
the  speakers  are  recorded  with  the  ear  microphones  and  used  to  determine  the  HRTF. 
The  sine  wave  frequency  is  held  constant  until  that  particular  HRTF  is  determined 
and  then  it  is  incremented  for  the  next  HRTF  sample.  The  speaker  location  contains 
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Figure  7.  Geodesic  sphere  with  sound  sources  at  multiple  locations 

(23:8) 

the  azimuth,  elevation  and  distance  information.  Smith  gives  the  location  in  azimuth 
and  elevation  of  272  speakers  used  to  determine  the  HRTF  by  the  above  method 
(25).  Once  all  the  HRTFs  are  calculated,  they  can  be  used  to  filter  any  sound  and 
its  reflections.  When  attenuation  and  TOA  are  combined  with  the  filtered  sound, 
any  recorded  sound  can  be  replayed  to  the  listener  through  headphones  to  give  the 
impression  of  a  sound  located  at  a  distinct  direction  from  outside  the  head. 

1.6.2  Artificial  Neural  Networks.  Current  models  of  producing  3D  sound 
from  the  HRTF  require  lengthy  computations  that  cannot  be  accomplished  in  real 
time  with  a  general  purpose  computer  (26).  Although  two  special  purpose  computers 
have  been  developed,  the  DIRAD  and  Convolvotron,  this  thesis  will  investigate  the 
use  of  artificial  neural  networks  to  approximate  the  HRTF  (23).  Along  with  approx¬ 
imating  the  HRTFs,  the  networks  will  provide  easy  interpolation  between  the  known 
HRTF  data  points.  They  may  also  provide  insight  into  the  underlying  function  of  the 
HRTF.  In  other  words,  the  networks  may  show  that  there  is  a  less  complex  function 
for  the  HRTF  than  what  is  represented  by  the  set  of  AAMRL  HRTF  data. 
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1.6. 2.1  Multilayer  Perceptron.  Two  types  of  artificial  neural  networks 
will  be  studied.  The  first  is  the  Multilayer  Perceptron  (MLP).  It  has  been  shown  by 
Cybenko  that  the  MLP  can  approximate  any  arbitrary  function  (20).  Figure  8  shows 
a  three  layer  MLP  (3:10:10:1).  The  input  layer  has  three  nodes  corresponding  to 


Azimuth 

Elevation 

Frequency 


Figure  8.  Multilayer-Perceptron  (MLP)  neural  network  with  two  hidden  layers  10 
nodes  each 

azimuth,  elevation  and  frequency.  The  next  two  layers  or  hidden  layers  have  10  nodes 
each  called  hidden  nodes.  The  last  layer  or  output  layer  has  one  node  corresponding 
to  the  HRTF.  The  number  of  nodes  in  the  hidden  layers  are  arbitrary,  however, 
Neti,  Young  and  Schneider  found  that  a  MLP  network  with  less  than  10  nodes 
in  the  hidden  layers  was  sufficient  for  output  direction  from  input  sound  (HRTF) 
(16).  Although  this  thesis  will  reverse  the  inputs  and  outputs,  input  direction  and 
output  HRTF,  it  seems  reasonable  to  start  the  investigation  with  a  similar  network 
architecture  because  it  will  basically  be  an  inverse  transformation  of  the  Neti,  Young 
and  Schneider  MLP. 
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The  MLP  will  be  trained  with  HRTF  data  from  Smith’s  thesis  (25).  The  data 
contain  25,296  HRTF  taken  from  272  speaker  locations  at  93  different  frequency.  The 
speaker  locations  range  from  0°  to  360°  in  azimuth  and  -90°  to  90°  in  elevation.  The 
frequencies  ranges  from  100  Hz  to  20,000  Hz.  A  draw  back  of  the  MLP  is  the  time 
required  to  train  the  network.  Three  layers  of  weights  have  to  be  trained.  Therefore, 
the  second  network  to  be  studied  is  the  Radial-Basis-Function  (RBF)  artificial  neural 
network. 

1.6. 2. 2  Radial- Basis- Function.  RBF  networks  are  broad  enough  for 
universal  approximation  (19).  The  RBF  is  a  two  layer  network.  Figure  9  shows  a 
RBF  network  with  8  nodes  in  the  hidden  layer.  Again,  the  input  layer  has  three 


Azimuth 
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Frequency 


Figure  9.  Radial-Basis-Function  (RBF)  neural  network 

nodes  corresponding  to  azimuth,  elevation  and  frequency.  The  next  layer  or  hidden 
layer  has  8  nodes  corresponding  to  the  centroids  of  the  clustering  algorithm.  The 
last  layer  or  output  layer  has  one  node  corresponding  to  the  HRTF  output.  The 
RBF  will  be  trained  with  the  same  data  as  the  MLP  above. 
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1.6.3  LNKmap.  The  computer  program  LNKmap  allows  the  user  to  train  a 
number  of  different  artificial  neural  networks  on  a  SUN  workstation.  Both  the  MLP 
and  RBF  networks  are  available  on  this  system.  Since  the  HRTF  is  a  complicated 
function  to  approximate,  bugs  in  the  LNKmap  program  would  be  hard  to  spot. 
Therefore,  relatively  simple  functions  were  tested  first  to  verify  the  inner  workings 
of  the  LNKmap  program.  For  example,  a  one  input,  one  output  function  like  y  = 
cos(27rx)  (figure  10)  and  a  two  input,  one  output  function  like  z  =  cos(27r(x2  —  y2)) 
(figure  11)  were  trained  and  analyzed.  From  the  training  of  these  simple  functions, 


Figure  10.  Training  function  y  =  cos(27rx) 

insight  into  the  workings  of  LNKmap  was  gained.  If  LNKmap  could  not  have  been 
trained  on  one  of  these  simple  functions  it  would  not  have  been  able  to  be  trained  a 
complicated  function  like  the  HRTF.  If  this  was  the  case,  other  programs  would  have 
been  investigated  to  train  the  networks.  Once  a  program  was  successfully  trained 
on  the  simple  functions,  the  process  of  training  on  the  HRTF  data  was  began. 
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y-axis  o  x-axis 

Figure  11.  Training  function  2  =  cos(27r(x2  —  y2)) 


1.7  Thesis  Outline 

Chapter  one  is  an  introduction  to  this  thesis.  It  presents  a  brief  background 
of  auditory  displays  which  leads  into  the  problem  statement  for  this  thesis.  After 
that  it  defines  key  terms  and  research  objectives.  Finally  it  presents  an  approach  to 
synthesizing  3D  cues  and  creating  artificial  neural  networks. 

Chapter  two  continues  the  discussion  of  visualizing  data  that  was  introduced 
in  chapter  one.  It  also  contains  a  review  of  past  research  in  sound  localization. 
Specifically,  the  work  of  Oldfield  and  Parker,  and  Begault  and  Wenzel  is  presented 
in  great  detail  because  of  their  direct  correlation  with  this  thesis.  This  chapter 
finally  presents  a  historical  record  of  the  learning  of  the  program  LNKmap  and  the 
background  needed  to  train  an  ANN  with  the  HRTF  data. 

The  previous  chapters  explored  the  background  of  3D  sound  and  ANN.  Chapter 
three  will  combine  these  two  concepts  to  train  a  ANN  to  approximate  the  3D  sound 
HRTF.  This  chapter  presents  the  methodology  used  in  training  a  MLP  and  RBF 
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ANN.  It  also  presents  the  objectives  and  methodology  of  two  experiments.  Finally, 
it  presents  a  detailed  analysis  of  variance  model  used  to  analyze  the  experimental 
results. 

Chapter  four  presents  the  results  of  the  two  experiments  conducted  during 
this  thesis  work.  Before  presenting  the  results  of  each  experiment  a  discussion  of  the 
types  of  errors  analyzed  in  the  experiments  is  presented. 

Chapter  five  presents  the  summary,  conclusions  and  recommendations  for  fu¬ 
ture  research. 
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II.  Background 


2. 1  Introduction 


The  visualization  of  the  data  is  important  to  help  determine  its  characteristics. 
Viewing  the  HRTF  data  is  a  classic  case  of  how  to  visualize  multidimensional  data. 
The  HRTF  data  consists  of  25,296  data  points.  Broken  down,  this  consists  of  272 
azimuth  and  elevation  locations:  0°  to  350°  in  azimuth  and  -82°  to  82°  in  elevation. 
The  azimuths  and  elevations  are  spaced  so  that  they  are  approximately  15°  apart  over 
the  whole  sphere  (see  figure  12).  For  each  azimuth  and  elevation  pair,  93  frequencies 
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Figure  12.  272  speaker  locations  around  the  head 


ranging  from  100  to  19,952  Hz  are  used  allowing  the  HRTF  data  to  consist  of  four 
axes  of  data:  azimuth,  elevation,  frequency  and  response.  Figure  13  shows  the  first 
1000  HRTF  samples  broken  down  into  the  four  axes.  Each  line  represents  the  value 
of  the  axis  for  that  sample.  For  example,  the  500t/l  sample  approximately  has  a 
response  of  0.05,  an  elevation  of  70°,  an  azimuth  of  100°  and  a  frequency  of  1000  Hz. 


16 


Figure  13.  HRTF  samples  from  0  to  1000 
2.2  Graphical  Display 

Matlab  is  a  powerful  tool  to  help  visualize  the  data.  One  approach  to  visual¬ 
izing  4  dimensions  is  to  set  the  first  three  dimensions  to  the  x,  y,  and  z-axis  and  the 
4th  dimension  to  a  set  color.  Matlab  has  a  number  of  commands  to  do  this. 

By  using  the  sphere  command,  a  sphere  around  the  head  can  be  color  coded 
with  the  HRTF  value.  The  grey  scale  level  indicates  the  magnitude  of  the  HRTF. 
This  display  uses  only  the  azimuth,  elevation  and  response  data.  A  separate  sphere 
is  needed  for  each  frequency.  Figure  14  shows  left  ear  HRTF  data  around  the  head 
for  a  frequency  of  5623  Hz.  Note  the  lighter  shades  of  grey  on  the  left  side  of  the 
head  versus  the  right  side  of  the  head.  This  is  because  the  left  ear  has  a  direct  path 
from  the  sound  source.  Figure  15  shows  most  of  the  frequencies  for  the  left  side  of 
the  head  only.  The  frequency  increases  from  119  Hz  to  19,953  Hz.  These  two  figures 
show  graphically  that  the  HRTF  magnitude  varies  both  as  a  function  of  location  and 
frequency. 

Similar  to  above  where  each  plot  consisted  of  a  single  frequency,  figure  16 
consists  of  a  single  elevation.  The  zero  elevation  data  consists  of  24  azimuth  points 
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Right  side  of  head  Left  side  of  head 

Figure  14.  Left  ear  HRTF  data  around  the  head  for  a  frequency  of  5623  Hz. 

with  93  frequencies  each.  For  a  total  of  2232  data  points.  The  response  has  a  number 
of  peaks  and  ridges  with  a  high  point  localized  slightly  in  front  of  the  left  ear  at  3000 
Hz.  Both  this  view  and  the  spherical  view  above  have  a  limitations  in  the  amount  of 
data  that  can  be  presented  graphically.  The  following  sections  present  a  discussion 
of  the  presentation  of  data. 

2.3  Graphical  and  Auditory  Displays 

Graphical  and  auditory  presentations  are  similar.  When  data  varies  with  time, 
graphical  presentation  such  as  animation  can  provide  a  powerful  analysis  tool  of 
graphical  data  (6).  In  the  same  sense,  sound  must  be  presented  in  time  to  be  heard. 
If  the  length  of  the  sound  is  too  short  its  pitch,  tempo,  rhythm,  and  other  attributes 
cannot  be  distinguished.  Graphical  symbols,  known  as  icons,  convey  information 
in  a  small  amount  of  space.  On  many  computers  with  graphical  interfaces,  icons 
represent  files,  directories  or  functions.  For  example,  a  picture  of  a  sheet  of  paper  can 
represent  a  text  document  or  a  picture  of  a  trash  can  represent  a  delete  file  function. 
“Earcons,”  the  audio  counterpart  of  icons,  similarly  convey  information  (4).  The 
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Figure  15.  Left  ear  HRTF  data  around  the  left  side  of  head  for  frequencies  119  Hz 
to  19,953  Hz 
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Frequency 

Figure  16.  Zero  elevation  HRTF  data  for  the  left  ear.  Frequency  and  Response 
shown  in  logarithm  scale 
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beep  of  the  computer  when  it  displays  a  warning  message  is  an  earcon.  The  two 
tones  of  a  door  bell  is  an  earcon  conveying  the  message  that  someone  is  at  the  door. 
In  graphical  presentation,  color  is  often  used  as  the  fourth  dimension  when  presenting 
4D  data.  Sound  resolution  is  similar  to  color  resolution  (6).  Both  the  frequency  of 
pitch  and  the  frequency  of  color  are  logarithmic,  (blue  435.8  nm,  green  546.1  nm  and 
red  700  nm)  (9)  Octaves  provide  expression  of  logarithmic  variance  (6).  On  the  other 
hand,  sound  is  different  from  a  graphic  display.  As  discussed  above,  the  graphical 
presentation  of  multivariate  data  is  commonly  limited  to  three  or  four  dimensions.  In 
contrast,  the  human  ear  can  discriminate  between  any  two  of  400,000  different  sounds 
presented  in  rapid  succession.  In  addition,  most  people  can  identify  49  different 
sounds  at  one  time  (6).  The  different  sounds  can  be  used  to  represent  different 
dimensions  of  data.  For  example,  pitch,  volume,  duration,  attack,  fundamental  wave 
shape  and  fifth  harmonic  wave  shape  can  be  used  to  represent  6  different  dimensions. 
Yeung  used  nine  to  twenty  dimensions  of  sound  (6).  Matthew  used  a  combination 
of  auditory  and  visual  presentations  of  data  (6).  Matthew  had  five  dimensions,  x 
and  y  visually  and  frequency,  timbre  and  amplitude  phonically.  Smith,  Bergeron 
and  Grinstein  also  combined  auditory  and  visual  presentations  of  data  (26).  In  their 
scheme,  the  auditory  data  representation  is  triggered  by  the  position  of  the  mouse 
on  the  visual  display.  Most  of  the  past  research  has  been  with  ID  and  2D  sound. 
Although  Bly  did  not  use  3D  sound  in  her  thesis,  she  suggests  using  3D  sound  for 
the  visualization  of  three  dimensional  data  (6).  “Several  sound  characteristics  were 
not  used  and  these  deserve  attention  in  the  future.  Certainly  location  is  an  easily 
detectable  attribute  of  sound.”  (6:40) 

2.4  Acuity  of  Sound  Localization  I 

In  an  effort  to  investigate  the  acuity  of  sound  localization,  Oldfield  and  Parker 
conducted  a  study  of  8  subjects  who  judge  the  apparent  spatial  location  of  white 
noise  through  speakers  over  a  range  —40°  to  +40°  in  elevation  and  0°  to  180°  in 
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azimuth  (17).  There  are  two  major  binaural  cues:  Interaural  Time  Difference  (ITD) 
and  Interaural  Intensity  Difference  (IID). 

2.4.1  Interaural  Time  Difference. 

•  Interaural  Time  Difference  is  a  function  of  the  distance  between  the  two  ears, 
the  angle  of  incidence  of  the  incoming  sound,  and  the  frequency  of  the  sound  (25) 

•  ITD  is  frequency  independent  below  approximately  500  Hz  and  above  approx¬ 
imately  3000  Hz;  furthermore,  between  these  frequencies  ITD  decreases  as  fre¬ 
quency  increases.  For  azimuth  angles  of  incidence  less  than  60  degrees  left  or 
right  of  the  front  of  the  listener  (0  degrees),  the  minimum  ITD  occurs  between 
1400  and  1600  Hz.  (25,  11:165-166). 

•  The  most  significant  changes  in  ITD  occur  at  angles  of  incidence  between  0 
and  30  degrees,  consistent  with  the  fact  that  humans  perceive  a  sound  best 
when  the  source  is  directly  in  front  of  the  listener  (25,  1:700). 

2.4.2  Interaural  Intensity  Difference. 

•  Interaural  Intensity  Difference  is  a  function  of  the  distance  between  the  two 
ears,  the  angle  of  incidence  of  the  incoming  sound,  and  the  frequency  of  the 
sound.  IID  can  also  be  affected  by  size  and  shape  of  the  torso,  head,  and  ears. 
The  torso  affects  primarily  the  low  frequencies  (25). 

•  The  most  significant  changes  in  IID  occur  at  angles  of  incidence  between  0  and 
30  degrees,  consistent  with  the  fact  that  humans  perceive  a  sound  best  when 
the  source  is  directly  in  front  of  the  listener  (25,  1:700). 

•  Improvement  in  localization  capabilities  of  humans  at  frequencies  above  3000 
Hz  is  a  result  of  improvement  in  the  IID  cue.  (25,  11:166). 

•  At  frequencies  above  8000  Hz,  IID  cues  increase  human  capabilities  to  localize 
in  elevation  as  well  as  in  azimuth  (25,  13:101). 
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From  these  two  cues,  there  is  a  range  of  positions  which  could  account  for 
the  same  pattern  of  interaural  differences.  In  a  simplified  model  of  the  head,  these 
positions  form  a  cone.  Watkins  calls  it  the  “cone  of  confusion”  (29).  Oldfield  and 
Parker  presented  a  second  set  of  cues  which  they  describe  as  spectral  modifications 
by  the  pinna  and  head.  These  spectral  modifications  are  what  Begault  and  Wenzel 
call  the  head  related  transfer  function  (HRTF)  (3).  Past  studies  show  evidence  of 
non-binaural  cues  (HRTFs)  (17).  These  studies  use  sound  sources  in  the  median 
sagittal  plane.  The  median  sagittal  plane  divides  the  body  into  left  and  right  sides. 
Other  studies  blocked  the  pinnae  or  low  pass  filtered  the  sound  sources  and  confirmed 
that  localization  ability  is  lost.  Few  studies  before  1984  use  sound  sources  outside 
the  median  sagittal  plane  (17). 

2.4-3  Experiment  1.  The  Oldfield  and  Parker  experiment  was  conducted 
in  an  anechoic  chamber  for  frequencies  down  to  250  Hz.  The  apparatus  for  the 
experiment  consisted  of  a  boom  for  positioning  the  sound  source,  a  photographic 
recording  system  and  pointing  gun.  The  subject’s  head  was  held  stationary.  The 
importance  of  head  motion  in  human  auditory  localization  cannot  be  overstated. 
Without  head  motion,  a  person’s  head  and  ears  can  still  act  as  a  directional  antenna; 
however,  sounds  located  at  0  degrees  azimuth  and  various  elevations  (i.e.  on  the 
medial  sagittal  plane)  can  be  extremely  difficult  to  localize.  Front/back  and  up/ down 
confusions  result  when  head  motion  is  not  allowed;  this  occurs  because  the  spectral 
and  temporal  cues  are  very  similar  at  both  ears  (31,  5,  25).  The  implementation 
of  head  motion  into  a  binaural  room  simulation  requires  that  a  real-time  system  be 
used  (23).  For  this  reason,  head  motion  was  not  implemented  into  the  non-real-time 
simulation  done  for  this  thesis. 

The  sound  was  white  noise  which  the  subject  could  activate  through  a  hand¬ 
held  switch.  The  subject  used  a  pointing  gun  to  “shoot”  the  sound.  The  position 
was  recorded  with  the  photographic  recording  system.  The  advantage  of  pointing  is 
that  the  response  is  not  limited  to  preset  locations.  The  disadvantage  of  pointing  is 
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the  limitation  of  the  accuracy  with  which  the  hand  can  be  positioned.  Fitts  found 
that  blind-positioning  movements  are  most  accurate  in  the  dead-ahead,  center  and 
lower  tier  positions  and  least  accurate  at  the  side  and  upper  tier  positions  (22,  7)  To 
avoid  pointing  inaccuracies  in  this  thesis,  the  sound  locations  were  fixed  and  subjects 
were  required  to  pick  preset  locations. 

2-4-4  Experiment  2.  To  validate  the  pointing  technique,  Oldfield  and 
Parker  conducted  a  second  experiment  to  the  measure  the  motor  skills  of  the  sub¬ 
jects.  The  apparatus  and  procedures  were  the  same  as  experiment  1  except  the 
subjects  were  not  blindfolded  and  the  chin  brace  was  loosely  fitted.  The  subjects 
were  allowed  to  see  the  speaker  position  before  the  sound  was  presented.  Next  the 
room  was  completely  darken  and  the  sound  presented  for  the  subject  to  “shoot.” 
They  concluded  that  the  biases  in  the  pointing  response  did  not  affect  the  overall 
outcome  of  the  results.  Reversals  were  never  observed  in  the  visual  task  (17). 

2-4-4- 1  Results.  Azimuth  error  and  elevation  error  were  measured. 
The  absolute  and  algebraic  form  of  these  were  computed.  Absolute  error  measured 
general  acuity  and  algebraic  error  measured  directional  bias.  For  algebraic  error, 
positions  perceived  higher  and  behind  were  termed  positive  errors  (actual  position 
minus  perceived  position).  Front/back  reversal  were  analyzed  separately:  “a  straight 
subtraction  of  the  actual  and  perceived  sound  position  has  not  been  regarded  by  re¬ 
searchers  as  an  honest  representation  of  error  (17,  28:587).”  This  thesis  also  analyzed 
reversals. 

There  were  few  front/back  reversals.  Most  occurred  in  the  upper  back  quad¬ 
rant.  One  exception  to  this  was  the  region  near  90°  azimuth  where  the  reversals  were 
uniform  over  elevation.  Another  kind  of  error  was  what  Oldfield  and  Parker  called 
“defaults  to  90°”.  Like  reversals  the  defaults  occurred  in  the  upper  back  quadrant. 
Defaults  tended  to  boost  the  error  in  the  upper  back  quadrant,  whereas  reversals 
and  non-reversals  had  the  same  amount  of  error. 
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2-4-5  Discussion.  The  results  support  the  model  of  the  cone-of-confusion 
in  the  case  of  elevation  error  biases  effecting  azimuth  error  biases.  Both  azimuth 
and  elevation  discrimination  is  less  accurate  behind  the  head.  Oldfield  and  Parker 
suggest  that  sound  comings  from  behind  the  head  do  not  directionally  interact  with 
the  convolutions  of  the  pinna.  Front/back  discrimination  derives  from  the  pinna 
when  the  head  is  held  still.  The  higher  number  of  reversal  in  the  back  region  is 
consistent  with  a  reduction  in  pinna  cues.  Binaural  differences  cannot  be  separated 
from  pinna  modification  because  they  are  in  the  signal  before  any  binaural  differences 
can  be  processed  by  the  brain. 

2. 5  Acuity  of  Sound  Localization  II 

Oldfield  and  Parker  also  conducted  a  study  of  4  subjects  who  judged  the  ap¬ 
parent  spatial  location  of  white  noise  through  speakers  over  a  range  —40°  to  +40° 
in  elevation  and  0°  to  180°  in  azimuth  (18).  In  this  experiment  the  pinna  of  the 
subjects  Were  blocked  to  simulate  the  effect  of  localization  without  the  pinna.  This 
experiment  has  a  direct  correlation  with  the  first  experiment  conducted  in  this  thesis 
where  the  hypothesis  of  no  HRTF  is  similar  to  no  pinna.  The  implication  is  that 
the  determination  of  azimuth  and  elevation  depends  on  more  than  just  the  binaural 
differences  from  the  sound  source.  An  additional  cue  is  the  spectral  modification 
made  by  the  pinna. 

2. 5. 1  Experiment.  The  data  collection  apparatus  was  the  same  as  described 
above  in  part  I  of  the  study  (17).  Each  subject  was  fitted  with  individually  cast 
molds  for  the  pinna.  An  access  hole  remained  to  the  auditory  canal.  Subjects  were 
presented  171  different  speaker  locations  while  blindfolded  and  head  help  stationary. 

2.5.2  Results.  The  Oldfield  and  Parker  study  presented  the  relationship 
between  azimuth  error  and  elevation  error.  This  thesis  presents  only  azimuth  error. 
With  the  pinna  blocked  the  overall  absolute  elevation  error  was  almost  double  that 
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of  azimuth  error  (11.9°  and  21.9°  respectively).  The  largest  differences  between 
absolute  azimuth  and  elevation  errors  occurred  in  the  front  quadrant.  The  absolute 
azimuth  error  was  largest  between  120°  and  160°  azimuth  position.  Azimuth  error 
is  reasonably  uniform  and  elevation  error  is  less  uniform  with  larger  errors  near  the 
extremes.  The  mean  algebraic  elevation  and  mean  azimuth  errors  were  generally 
positive  in  the  front  and  negative  in  the  back.  The  subjects  perceived  the  sounds  in 
the  front  pulled  higher  and  toward  the  ears.  The  subjects  also  perceived  the  sounds 
in  the  back  pulled  lower  and  toward  the  ears.  Filling  the  convolutions  of  the  pinna 
has  the  effect  of  displacing  elevation  judgements  toward  0°. 

A  comparison  of  the  results  of  the  pinna  filled  versus  unfilled  (normal)  showed 
that  there  was  a  increase  in  elevation  error  in  the  pinna- filled  case  (filled  21.9°  versus 
normal  8.4°).  The  azimuth  error  also  increase  in  the  pinna-filled  case,  but  not  as 
great  (filled  11.9°  versus  normal  9.3°).  The  mean  absolute  azimuth  error  for  the 
filled-pinna  was  generally  lower  than  the  normal  pinna  except  to  azimuth  positions 
140°  to  160°.  The  error  did  not  change  significantly  by  azimuth  position,  however  it 
did  change  significantly  by  elevation  position. 

A  number  of  subjects  made  front/back  reversals.  Reversals  accounted  for  26% 
of  the  responses:  31%  were  front/back  reversals  and  21%  were  back/front  reversals. 
These  results  are  compared  with  the  findings  in  this  thesis  in  chapter  4. 

2. 6  Headphone  Localization  of  Speech 

A  study  of  11  inexperienced  subjects  who  judge  the  apparent  spatial  location 
of  headphone-presented  speech  stimuli  filtered  with  non-individualized  head-related 
transfer  functions  (HRTFs)  was  conducted  by  Begault  and  Wenzel.  (3).  This  is  in 
contrast  with  the  Oldfield  and  Parker  study  who  used  experienced  subjects  listening 
to  white  noise  from  an  actual  sound  source  in  three  dimensional  space. 

HRTFs  are  “the  listener-specific,  direction-dependent  acoustic  effects  imposed 
on  an  incoming  signal  by  the  pinnae  (3:361-362).”  It  should  be  noted  that  not  only 
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the  pinnae,  but  the  head  and  torso  effect  the  incoming  signal  also.  Note  also  that 
this  definition  says  “listener-specific” .  Although  HRTF  are  “listener-specific” ,  they 
generalize  to  other  people.  The  main  thrust  of  the  Begault  and  Wenzel  study  is 
to  investigate  how  non-individualized  HRTFs  work.  The  HRTFs  used  are  from  the 
pinnae  of  a  good  localizer  (or  as  Dr  Rogers’  would  say  a  person  with  a  “golden  ear”). 
The  study  decides  against  simple  averages  of  HRTFs  to  avoid  possibly  elimination 
of  distinctive  spectral  features.  This  thesis  used  AAMRL  HRTFs  (12). 

Eleven  adults  were  used  in  the  Begault  and  Wenzel  study.  They  were  screened 
by  oral  questions  about  hearing  loss,  recent  exposure  to  loud  noises  and  work  en¬ 
vironment.  There  is  no  apparent  relation  between  audiometric  measurements  and 
binaural  performance  (3).  Therefore  this  thesis  will  forego  the  audiometric  measure¬ 
ment  of  subjects. 

The  sounds  used  in  the  study  were  from  a  set  of  45  one-  or  two-syllable  words 
representing  a  particular  phoneme.  The  sounds  were  filtered  with  the  HRTFs  to 
simulate  sounds  coming  from  0°  elevation  and  0°,  ±30°,  ±60°,  ±90°,  ±120°,  ±150° 
and  180°  azimuth.  Similarly,  this  thesis  used  the  male  voice  utterance  “seventeen” 
presented  at  0°  elevation  and  24  azimuth  locations.  Begault  and  Wenzel  conclude 
that  most  listeners  can  obtain  useful  directional  information  from  speech  without 
requiring  individual  HRTFs.  Taking  this  result  a  step  further,  this  thesis  investigated 
the  hypothesis  of  obtaining  useful  directional  information  from  speech  filtered  with 
a  neural  network  trained  HRTF. 

2. 7  Learning  LNKmap  with  a  simple  function 

2.7.1  Linear  function.  During  the  spring  of  1994,  the  program  LNKnet 
was  gaining  popularity  at  AFIT  as  a  neural  network  classifying  tool.  LNKnet’s 
companion  program  for  function  approximation  (LNKmap)  had  not  yet  been  fully 
investigated.  To  begin  the  exploration  of  LNKmap,  a  simple  function  i.e.  y  =  x  was 
inputed  into  LNKmap. 
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Since  LNKmap  was  similar  to  LNKnet  the  file  structure  was  assumed  to  be 
the  same.  Therefore  the  input  data  file  should  be  in  two  columns  with  the  output 
values  in  the  first  column  and  the  input  values  in  the  second  column.  A  small  file 
was  used  starting  with  10  points  between  zero  and  one  (see  table  1).  Since  the 
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Table  1.  Simple  linear  function 

function  is  linear,  a  linear  output  node  function  was  used.  Most  network  users  call 
the  “output  node  function”  a  “non-linearity  function” .  A  common  starting  point  is 
a  3  layer  multi-layer  perceptron  (MLP),  1  input  node,  1  hidden  node  and  1  output 
node  notated  1:1:1  (figure  17).  Some  researchers  refer  to  this  architecture  as  a  2 

Input  Node  Hidden  Node  Output  Node 

Figure  17.  A  3  layer  net,  1  input  node,  1  hidden  node  and  1  output  node  notated 
1:1:1. 

layer  net  referring  to  the  number  of  weight  layers.  This  thesis  will  try  to  use  the 
colon  notation  where  ever  a  confusion  may  exist.  Many  classifiers  present  the  data 
in  random  order,  however  for  function  approximation  it  is  best  to  present  the  data 
in  order  (21).  With  the  training  set  described  above  the  net  learned  the  function  in 
about  100  epochs. 
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Figure  18.  y  —  x  (solid)  and  MLP  (1:1:1)  output  (dash)  after  100  epochs 

Next  a  standard  sigmoid  on  the  same  data  was  tried  while  also  increasing  the 
number  of  hidden  nodes  to  5  (1:5:1).  The  net  learned  the  function  in  about  1000 
epochs  (see  figure  19).  In  an  effort  to  get  a  lower  error  a  second  hidden  layer  with 
one  node  (1:5:1:1)  was  added.  This  architecture  is  shown  in  figure  20.  After  a  1000 
epoch,  the  net  had  not  learned  the  function.  More  nodes  to  the  second  hidden  layer 
(1:5:5:1)  were  added.  After  another  1000  epoch,  the  net  had  not  learned  the  function. 

Adding  more  nodes  seems  to  slow  the  learning  process  therefore  one  hidden 
layer  was  used  in  subsequent  trials.  With  an  architecture  of  1:10:1  and  a  standard 
sigmoid  output  node  function,  the  net  learned  the  function  (see  figure  21).  With  the 
same  architecture  and  using  the  symmetric  sigmoid  output  node  function,  the  net 
learned  the  function  with  an  even  lower  error  (see  figure  22). 

2.7.2  Non-Unear  functions.  At  this  point,  some  nonlinear  data  was  tried. 
The  10  point  data  was  modified  to  be  low  near  the  end  points  and  higher  in  the 
middle  ( y  see  table  2  and  figure  23).  The  LNKmap  plot  of  the  data 

approximated  the  function  x  ~  —y2  (see  figure  24).  This  indicated  that  the  columns 
of  the  data  file  were  reversed.  This  discovery  uncovered  that  LNKmap  needed  the 
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Figure  19.  y  =  x  (solid)  and  MLP  (1:5:1)  output  (dash)  after  1000  epochs 
Input  Node  Hidden  Node  Hidden  Node  Output  Node 


Figure  20.  A  4  layer  net,  1  input  node,  5  hidden  nodes, 1  hidden  node  and  1  output 
node  notated  1:5:1:1. 
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Figure  21.  y  =  x  (solid)  and  MLP  (1:10:1)  output  (dash)  after  1000  epochs 


Figure  22.  y  =  x  (solid)  and  MLP  (1:10:1)  output  (dash)  after  300  epochs 
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WM 

X 

.1 

.1 

.4 

.2 

.6 

.3 

.9 

.4 

1.0 

.5 

1.0 

.6 

.8 

.7 

.5 

.8 

.3 

.9 

.1 

1.0 

Table  2.  Simple  nonlinear  function 


Figure  23.  y  ss  —  x2  (solid)  and  MLP  (1:10:1)  output  (dash) 
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Figure  24.  x  «  —  y1  (solid)  and  MLP  (1:10:1)  output  (dash) 

data  presented  in  a  ASCII  matrix  where  the  first  columns  were  the  input  parameters 
and  the  last  columns  were  the  output  parameters.  This  is  contrary  to  LNKnet  where 
the  first  columns  are  the  class.  A  correction  to  the  columns  was  made  and  training 
of  a  number  of  net  configuration  was  resumed.  The  net  could  only  match  a  straight 
line  to  the  data  which  contributed  to  the  low  number  of  data  points. 

2. 8  Learning  LNKmap  with  a  Cosine  function 

After  determining  the  correct  file  format,  a  trigonometric  function  was  tested. 
The  function  to  be  trained  was  y  =  cos(27r:r),  where  x  ranged  from  0  to  1  with 
a  step  size  of  ^  or  ^qq  f°r  a  total  of  629  points.  Using  the  same  methodology 
as  before,  starting  with  a  net  architecture  of  1:1:1  and  linear  output  node  function 
many  configurations  were  tried  settling  on  two  hidden  layers  with  10  nodes  each 
(1:10:10:1).  Through  all  these  trials,  the  current  experiment  was  always  continued 
and  randomly  presented  the  data.  Although,  as  mentioned  above,  the  data  should 
be  presented  in  order,  the  strategy  at  this  point  in  the  investigation  of  LNKmap  was 
to  follow  the  same  procedures  as  LNKnet.  The  results  of  the  training  was  that  the 
net  learned  the  first  half  of  the  cosine  curve  and  then  proceeded  asymptotically  to  a 
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line  parallel  to  the  x  axis  for  the  second  half  of  the  curve.  The  approximate  number 
of  epochs  at  this  point  was  10,000  to  20,000. 

At  this  point  it  was  concluded  that  random  presentation  of  the  data  was  not 
working  and  the  data  was  presented  in  the  order  of  the  input  file  (x  increasing  from 
zero  to  one).  The  net  learned  the  function. 

In  order  to  duplicate  the  results,  Capt  Smith  used  the  same  cosine  training 
file.  He  trained  the  net  for  3000  epochs  using  random  presentation  of  the  data.  The 
net  learned  the  first  half  as  above.  Next  he  continued  to  train  the  net  to  present  the 
data  in  order.  The  net  learned  the  whole  cosine  function.  Unfortunately,  plots  of 
this  data  are  not  available  because  the  LNKmap  program  has  since  been  changed  as 
discussed  below. 

2.8.1  Cosine  Revisited.  Prom  early  training  attempts  with  LNKmap,  it 
was  determined  that  LNKmap  did  not  correctly  train  on  data  with  more  than  one 
input.  The  author  of  LNKmap  was  contacted  and  a  fix  was  obtained.  After  this  fix, 
the  cosine  function  was  retrained  using  both  a  RBF  and  a  MLP  architecture.  This 
time  the  function  trained  was  y  =  cos(x),  where  x  ranged  for  0  to  27 r  with  a  step 
size  of  0.01  for  a  total  of  629  points  (see  figure  25).  Before  training,  the  data  was 
normalized  to  values  between  0  and  1  using  a  Dynamic  Range  Compression  (DRC) 
algorithm  (see  figure  26).  Given  data  in  a  matrix  where  each  column  represents  a 
variable,  the  DRC  subtracts  the  minimum  value  for  each  column  from  each  value 
in  the  column  and  divides  by  the  maximum  value  of  the  difference  for  each  column. 
The  Matlab  code  can  be  seen  in  appendix  C. 

2.8. 1.1  RBF  Architecture.  The  first  architecture  used  for  training 
was  a  Radial  Basis  Function  (RBF)  using  a  K-means  clustering  algorithm.  The  RBF 
was  trained  using  1,  2,  8,  9  and  32  centers  in  the  K-means  clustering.  Figure  27  shows 
the  result  with  1  center  in  the  K-means  clustering.  Figure  28  shows  the  result  with 
2  centers  in  the  K-means  clustering.  Since  the  cosine  function  in  the  range  0  to  2ir 
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is  essentially  unimodal  only  one  clustering  center  is  needed.  Forcing  more  than  one 
center  increases  the  error.  The  LNKmap  clustering  algorithm  does  not  place  centers 
in  the  same  place.  A  different  clustering  algorithm  may  have  placed  the  centers  in 
the  same  place.  Figure  29  shows  that  adding  more  centers  will  reduce  the  error,  but 
at  the  expense  of  a  larger  network. 


Figure  27.  y  =  cos(x)  (solid)  and  RBF  (1:1:1)  output  (dash)  with  1  center  in  the 
K-means  clustering  algorithm 

2.8. 1.2  MLP  Architecture.  Next  a  MLP  architecture  of  1:10:10:1 
was  chosen  to  match  the  best  results  from  the  previous  cosine  training  (before  the 
software  fix).  The  MLP  learned  the  cosine  in  only  220  epochs.  Figure  30  shows  the 
results  after  training  for  15,  100  and  220  epochs  respectively. 

2.8.2  Conclusion.  It  has  not  been  determined  why  the  network  needed  the 
data  presented  first  randomly  and  then  in  order  to  learn  the  function.  Presenting  the 
data  only  randomly  or  only  in  order  did  not  produce  satisfactory  results.  The  obvious 
answer  is  that  a  bug  was  in  the  program.  After  the  fix  the  program  performed  as 
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Figure  30.  y  =  cos(x)  (solid)  and  MLP  (1:10:10:1)  output  (dash)  after  15,  100  and 
220  epochs 


expected.  Meaning  when  the  data  was  presented  in  order  the  network  learned  the 
function  faster.  Regardless,  the  major  effort  of  this  investigation  was  to  learn  how 
to  use  LNKmap  as  a  function  approximation  tool.  This  task  was  accomplished. 

2.9  Learning  LNKmap  with  a  Two  Input  Cosine  function 

The  next  function  tested  was  the  “bird  function,”  z  =  cos(27t(x2  —  y2))  where 
x  and  y  ranged  from  0  to  1  for  a  total  of  10201  training  points  (see  figure  31).  This 


Figure  31.  z  =  cos(27r(:r2  —  y2))  where  x  and  y  ranged  from  0  to  1. 
function  was  chosen  because  of  its  simplicity  and  the  output  values  were  relatively 
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low  between  -1  and  1  so  no  normalization  was  required.  Expanding  the  axes  of  the 
bird  function  actually  shows  the  whole  function  to  be  a  symmetric  cross  shape  (see 
figure  32). 


Figure  32.  2  =  cos(27r(x2  —  y 2))  where  x  and  y  ranged  from  -1  to  1. 


2.9.1  RBF  Architecture.  Two  network  architectures  were  tested  with  the 
bird  data.  The  first  was  a  Radial  Basis  Function  (RBF)  using  a  K-means  clustering 
algorithm.  The  RBF  was  trained  using  8,  64  and  128  centers  in  the  K-means  clus¬ 
tering  (see  figure  33).  Power  of  two  centers  above  128  (i.e.  256,  512,  etc. . .)  caused 
an  error  in  the  program.  The  best  the  LNKmap  RBF  algorithm  could  learn  the 
bird  data  is  shown  in  figure  33.  The  placement  of  the  clustering  centers  can  be  seen 
as  bumps  in  the  bird’s  back.  The  RMS  error  for  the  training  using  8,  64  and  128 
centers  is  shown  in  figure  34.  The  highest  error  was  seen  near  the  edge  of  the  graphs 
where  the  function  was  constrained.  Increasing  the  number  of  centers  in  this  case 
did  not  reduce  the  error.  The  mean  error  for  the  64  centers  case  was  0.1229  while 
the  mean  error  for  the  128  centers  case  was  0.1273.  Figures  33  and  34  do  not  show 
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much  difference  between  the  64  center  and  128  center  cases.  To  examine  the  effect  of 
doubling  the  number  of  clustering  centers  from  64  to  128,  the  difference  between  the 
two  outputs  is  shown  in  figure  35.  The  average  difference  was  9.3888  x  10-5.  The 
difference  between  the  two  RMS  errors  is  shown  in  figure  35.  The  average  difference 
of  the  RMS  error  was  0.0044. 


Figure  33.  Output  of  RBF  with  8,  64  and  128  centers  in  the  K-means  clustering 
algorithm 


Figure  34.  RMS  error  of  RBF  with  8,  64  and  128  centers  in  the  K-means  clustering 
algorithm 


2.9.2  MLP  Architecture.  The  second  network  architecture  used  to  train  on 
the  bird  function  was  a  Multi-Layer  Perceptron  (MLP).  Three  MLP  architectures 
were  chosen  to  train  the  bird  data.  For  each  architecture,  the  non-linearity  function 
used  was  the  symmetric  sigmoid  (see  figure  36) .  The  first  architecture  had  one  hidden 
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Figure  35.  Difference  between  outputs  (left)  and  RMS  errors  (right)  using  64  and 
128  centers  in  the  K-means  clustering  algorithm 


Figure  36.  Symmetric  sigmoid  used  for  non-linearity  function  y  =  —  1 
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Figure  38.  RMS  error  of  MLP  after  220  and  440  epochs  with  architecture  of  2:10:1 


had  one  hidden  layer  of  64  nodes  (2:64:1).  This  architecture  was  chosen  to  compare 
with  the  64  center  RBF.  Figure  39  shows  the  progress  of  learning  for  20,  50  and  100 
epochs  respectively.  The  RMS  error  is  shown  in  figure  40.  Comparing  figures  33  and 
39  it  is  clear  that  the  RBF  outperformed  this  architecture  in  both  speed  and  error 
even  though  the  architectures  have  basically  the  same  number  of  nodes.  The  third 
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Figure  40. 


RMS  error  of  MLP  after  20,  50  and  100  epochs  with  architecture  of 
2:64:1 


architecture  had  two  hidden  layers  of  10  nodes  each  (2:10:10:1).  Figure  41  shows 
the  progress  of  learning  for  20,  220  and  440  epochs  respectively.  The  RMS  error  is 
shown  in  figure  42. 


Figure  41.  Output  of  MLP  after  20,  220  and  440  epochs  with  architecture  of 
2:10:10:1 


Figure  42.  RMS  error  of  MLP  after  20,  220  and  440  epochs  with  architecture  of 
2:10:10:1 


2.9.3  Conclusion.  Comparing  figures  33  to  41  with  figure  31  (the  original 
bird  data),  it  is  clear  from  these  results  that  the  MLP  was  able  to  learn  the  bird 
data  better  than  the  RBF.  However  it  is  important  to  note  that  this  result  does  not 
generalize  to  other  data  and  the  different  architectures  produced  alternate  results. 
The  main  purpose  of  the  bird  data  tests  was  to  verify  that  the  LNKmap  program 
could  approximate  a  function  with  more  than  one  input. 


44 


2.10  Conclusion 


This  chapter  explored  the  background  of  both  actual  and  simulated  3D  sound. 
It  showed  that  natural  errors  occur  (i.e.  reversals)  with  both  actual  3D  and  simu¬ 
lated  3D  sound  sources.  The  head,  pinna  and  HRTF  are  important  for  3D  sound 
localization.  It  also  showed  that  non-individualized  HRTFs  can  be  successfully  used 
with  speech. 

The  chapter  also  explored  the  use  of  LNKmap  as  a  tool  for  function  approxi¬ 
mation.  LNKmap  was  able  to  successfully  train  a  number  of  different  neural  network 
architectures  from  simple  linear  and  non-linear  functions  to  two  input  trigonometric 
functions. 
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III.  Methodology 


3. 1  Introduction 

The  previous  chapters  explored  the  background  of  3D  sound  and  neural  net¬ 
works.  This  chapter  will  combine  the  two  concepts  to  train  a  neural  network  to 
approximate  the  3D  sound  HRTF. 

3.2  Learning  HRTF  Data  for  Zero  Elevation 

3.2.1  MLP  Architecture.  Having  shown  that  the  LNKmap  program  works 
correctly  for  multiple  input  data,  the  next  step  is  to  map  the  Head  Related  Transfer 
Function  (HRTF)  data.  The  HRTF  data  consists  of  25,296  data  points.  Broken 
down  that  is  272  azimuth  and  elevation  locations:  10°  to  350°  in  azimuth  and  -90° 
to  90°  in  elevation.  For  each  azimuth  and  elevation  pair,  93  frequencies  ranging  from 
100  to  20,000  Hz  are  used.  To  simplify  the  training,  only  the  zero  elevation  is  used  in 
this  thesis.  The  zero  elevation  data  consists  of  24  azimuth  points  with  93  frequencies 
each.  For  a  total  of  2232  data  points  (see  figure  43).  Since  the  MLP  worked  the 


Figure  43.  Zero  elevation  HRTF  data  for  the  left  ear.  Frequency  and  Response 
shown  in  logarithm  scale 
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best  with  the  bird  data,  the  zero  elevation  was  first  trained  with  a  MLP  with  an 
architecture  of  2:50:50:1.  The  output  node  function  was  a  symmetric  sigmoid.  The 
weights  were  updated  after  each  trial  (no  batch).  Figure  44  shows  the  results  after 
7235  epochs.  The  MLP  extremely  smoothes  the  HRTF  data.  An  experiment  needs 


Frequency 

Figure  44.  Zero  elevation  MLP  (2:50:50:1)  after  7235  epochs.  Frequency  and  Re¬ 
sponse  shown  in  logarithm  scale 

to  be  constructed  to  determine  if  the  jaggedness  of  the  actual  HRTF  data  is  needed 
to  perceive  3D  sound.  This  is  not  addressed  in  this  thesis. 

3.2.2  RBF  Architecture.  The  RBF  was  tested  with  the  zero  elevation  data 
next.  A  K-means  clustering  algorithm  with  256  centers  was  used.  Compared  with 
the  MLP  the  RBF  trained  a  lot  faster  and  results  were  even  better  from  a  visual 
standpoint.  Figure  45  shows  the  results  of  training  with  the  RBF.  Blank  areas  in 
the  figure  are  where  the  RBF  output  was  negative  or  zero  (logarithmic  scale  requires 
positive  values).  Since  negative  HRTFs  are  not  possible,  they  will  be  zeroed  for  filter 
design.  The  RBF  is  able  to  more  closely  match  the  jaggedness  of  the  actual  HRTF 
data.  Figure  46  show  the  RMS  error  for  the  MLP  and  RBF  networks  respectively. 
A  close  inspection  of  figure  46  reveals  that  the  maximum  magnitude  for  RBF  error 
(0.2707)  was  actually  higher  than  the  MLP  error  (0.1573).  However  the  MLP  had 
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Figure  45.  Zero  elevation  RBF  with  256  centers.  Frequency  and  Response  shown 
in  logarithm  scale 


Frequency  Frequency 


Figure  46.  Zero  elevation  RMS  err  for  the  MLP  (left)  and  RBF  (right) 


more  error  overall.  The  sum  of  the  MLP  and  RBF  error  was  7.3518  and  5.8528 
respectively. 

3.2.3  Conclusion.  It  would  be  premature  to  generalize  that  the  RBF  works 
better  than  the  MLP  for  jagged  data  like  the  HRTF  data  as  compared  to  smooth 
data  like  the  bird  data.  The  MLP  and  RBF  had  two  different  architectures.  The 
number  of  weights  used  by  the  MLP  used  was  100  weights  (50  weights  per  hidden 
node)  while  the  RBF  used  512  weights  (256  centers  by  K-means  and  256  weights). 
That  is  a  ratio  of  5  to  1.  However  the  RBF  does  have  the  desirable  feature  of  speed. 

3.3  Experiment  1:  With  and  Without  HRTF 

In  the  process  of  manipulating  and  filtering  sound  sources  with  the  HRTF  an 
obvious  question  arose,  “Whether  or  not  the  HRTF  data  helped  in  3D  localization?” 
To  answer  this  question  experiment  one  was  conceived. 

3.3.1  Objective.  Determine  whether  or  not  the  utilization  of  AAMRL’s 
HRTF  data  for  filter  design  provides  a  specific  advantage  over  the  use  of  only  ITD’s 
in  generation  of  a  binaural  signal.  The  filters  were  designed  using  the  program  ESPS 
by  Entropic  Research  lab.  The  design  was  an  FIR  filter  using  the  weighted  mean 
square  error  criterion.  The  weights  corresponded  to  the  AAMRL  HRTF  for  the  93 
frequencies  at  the  test  location.  A  separate  filter  was  designed  for  each  ear  and  for 
each  location  tested. 

3.3.2  Method.  Twenty  subjects  were  presented  48  binaural  sounds  through 
a  set  of  headphones.  In  this  forced  choice  experiment,  the  subjects  were  asked  to 
select  the  location  from  which  they  perceived  the  sound.  Subjects  choose  from  a 
display  provided  on  a  computer  screen  by  moving  the  mouse  to  place  the  cursor  over 
the  perceived  location  and  clicking  the  button  (see  figure  47).  The  display  had  only 
24  possible  azimuths  to  choose  from,  and  the  elevation  angle  is  always  zero.  These 
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Figure  47.  Display  provided  on  a  computer  screen  for  selection  of  the  perceived 
location 

24  locations  corresponded  to  the  24  possible  locations  of  the  computer  generated 
binaural  signals.  The  signal  used  in  each  case  was  either  digitally  filtered  and  de¬ 
layed  appropriately  for  a  given  angle  of  azimuth,  or  simply  delayed  appropriately  to 
account  for  the  angle.  In  either  case,  the  original  signal  was  the  speech  pattern  for 
a  male  utterance  of  the  word  “seventeen.” 

Subjects  were  asked  to  close  their  eyes  during  the  data  collection  phase  of  this 
experiment.  Interaction  effects  between  the  subjects  and  the  two  fixed  factors  (angle 
and  method  of  signal  generation)  in  this  experiment  are  assumed  to  be  zero.  The 
angle  at  which  the  signal  is  designed  to  come  from  was  called  factor  B,  and  has 
24  levels  (one  for  each  azimuth).  Factor  A  represented  the  method  by  which  the 
binaural  signal  was  generated,  and  has  2  levels  (one  for  HRTF  combined  with  ITD 
and  one  for  ITD  only).  Factor  C  represented  the  subjects. 

3.3.3  The  Model.  The  model  used  in  experiment  one  is  an  analysis  of 
variance  (ANOVA)  model  for  a  reduced  three-factor  experiment  (14).  It  is  assumed 
that  the  a  ■  b  •  c  treatment  combinations  represent  random  samples  of  size  n,  where 
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a  =  2,  the  number  of  filters;  b  =  24,  the  number  of  azimuths;  c  =  20,  the  number  of 
subjects;  and  n  —  1,  the  sample  size.  The  model  is 

Xijki  =  H  +  oq  +  +  7fc  +  ( 0>P)ij  +  (017)1  k  +  (07  )jk  +  £ijkl  (1) 


where 


i  =  1, 2, . . . ,  a 

j  =  1,2,...,  6 
k  =  1, 2, . . .  ,c 
l  =  1, 2, . . .  ,n 

The  Xtjki  are  assumed  to  be  normally  distributed  and  the  following  restrictions 
apply: 


5>  =  0,  £/J,  =  0,  £71  =  0  (2) 

i  j  k 

£>/?)(,  =  0,  =  0  (3) 

i  j 

S(aT  )<*  =  °>  =  0  (4) 

i  fc 

52(07)#  =  0,  £(07  =  0  (5) 

j  k 


Here  a,,  /3j,  and  7*,  represent  the  main  effects  due  to  factors  filter,  azimuth  and 
subject  respectively.  .  The  terms  ( a/3)ij ,  (07)^,  and  (P'y)jk  represent  interaction 
between  the  factors.  Let 


Hijk.  =  mean  for  the  (zjfc)th  treatment  combination 
fiij..  =  the  average  of  the  population  means  for  those 
populations  receiving  the  ith  level  of  A  and 
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(6) 


the  jth  level  of  B 


=  E 


/t ijk ■ 


fc= 1 


IH- 


the  average  of  the  population  means  for  those 
population  receiving  the  ith  level  of  A 


=  EE 


ft ijk ■ 
be 


j= 1  fc=l 

H  =  the  average  of  the  population  means  for  all 
populations  under  consideration 


=  EEE 

*=1  j= l  k= l 


fJ-ijk' 

abc 


(7) 


(8) 

(9) 


The  other  mean  terms  are  defined  in  a  similar  manner: 


(J'i-k- 

"  h  b 

(10) 

f^-jk- 

a  n 

„  H'ijk- 

a 

(11) 

fi.j.. 

a  c 

__  ^  H'ijk’ 

i=  1  fc=l  aC 

(12) 

-  £5  - 

(13) 

(14) 

The  main  effects  and  interaction  effects  are  defined  as  follows: 


Oii 

—  Hi— 

~  V 

(15) 

Pi 

=  H-j- 

~  V 

(16) 

Ik 

=  A*--*- 

~  I1 

(17) 

( Ot(3)ij 

=  Hij- 

-  Hi-  -  +  a* 

(18) 

(«7  )** 

=  Hik-- 

-  Hi...  -  n-k-  +  At 

(19) 

52 


{.fi'i)jk  —  f^jk-  (l.j..  (1.. k ■  "h  A* 


(20) 


The  totals  for  the  three-way  ANOVA  are  shown  next: 


Tj...  =  sum  of  all  responses  from  experimental  units  receiving 
level  %  of  A  and  B;  where  x  is  the  absolute  difference 
of  the  actual  location  and  perceived  location  of  the  sound 

=  E  (21) 

j=lk= 1  (=1 

Tij..  =  sum  of  all  responses  from  experiment  units  receiving 
level  i  of  A  and  j  of  B 

=  EE»i«  (22) 

k= l  (=1 

The  other  total  terms  are  defined  in  a  similar  manner: 


T+.  =  EEX>  m  (23) 

4=1  k=l  1=1 

T..k.  =  tt±xijkl  (24) 

4=1  j=l  1= 1 

Ti.k.  =  EX>««  (25) 

j-i  i=i 

T.jk.  =  EExijki  (26) 

4=1  i=l 
n 

Tijk ■  =  E  Xiikl  (27) 

t....  =  EEEE^  (28) 

4=1  j  =  lfc=l  (=1 


The  sums  of  squares  needed  in  the  three-way  ANOVA  are: 


SSB 

b  rn2  <7-12 

_  1  -j- 

r-f  acn  abcn 

3= 1 

(30) 

SSc 

_  f  T3  T~~ 

abn  abcn 

(31) 

SSAB 

a  b  rr2  a  rp2  b  rp2  rp2 

y^y'  ij-  y'1i-  y |  1  ■■■■ 

“7  cn  ben  acn  abcn 

2=1  J  =  l  2=1  J  =  l 

(32) 

SSac 

a  C  rp2  a  rp2  C  rp2  rp2 

„„  y^y^^ifc-  y^  1  |  1  • 

F^i  bn  ben  ^  abn  abcn 

(33) 

SSbc 

b  c  rp2  b  7^2  c  rp2  rp2 

y'  y-'  -jk-  y^  ±-i-  y ^1-k-  | 

“T  rT  an  acn  abn  abcn 

K—l  AC=1 

(34) 

“Total 

abcn  rp2 

(35) 

SSE 

=  SSfotal  -  SSA  -  SSb  -  SSV;  -  SSab  -  SSac  -  SSBC 

(36) 

Finally  an  ANOVA  table  as  shown  in  table  3  is  set  up  to  test  the  null  ( H0 ) 
and  alternative  (Hi)  hypothesis: 

There  is  no  interactions  of  the  factors  A  and  B  means; 
that  is,  ( af3)ij  =  0. 

J$°:  There  is  no  interactions  of  the  factors  A  and  C  means; 

that  is,  (aj)ik  =  0. 

FTq3  :  There  is  no  interactions  of  the  factors  B  and  C  means; 

that  is,  (/?7 )jk  =  0. 

Hq\  Factor  A  means  are  equal;  that  is,  a ,•  =  0. 

Hq5'1:  Factor  B  means  are  equal;  that  is,  /3j  =  0. 

H^:  Factor  C  means  are  equal;  that  is,  7 *  =  0. 

//j1-6);  The  means  are  not  equal. 

3.4  Experiment  2:  AAMRL  HRTF  and  RBF  HRTF 

3.4-1  Objective.  Determine  whether  or  not  the  utilization  of  AAMRL’s 
HRTF  data  for  filter  design  is  statistically  different  from  the  utilization  of  RBF 
neural  network  data  for  filter  design  in  generation  of  a  binaural  signal. 
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Source  of 

Degrees  of 

Sum  of 

Mean 

Calculated 

Variation 

Freedom 

Squares  (SS) 

square  (MS) 

F  Value 

A 

a  —  1 

SSA 

SSA/(a-  1) 

MSa/MSe 

B 

6-1 

SSb 

SSB/(b  -  1) 

MSb/MSe 

C 

c  -  1 

SSC 

SSc/(c  -  1) 

MSc/MSe 

AB 

(a  —  1)(6  —  1) 

ssAB 

SSAB/(a-l)(b-l) 

MSabIMSe 

AC 

(o  -  l)(c  -  1) 

SSAC 

SSAC/(a-l)(c-l) 

MSac/MSe 

BC 

(b~  l)(c-l) 

SSbc 

SSBC/(b-l)(c-l) 

MSbc/MSe 

Error  (E) 

DFe  =  subtraction 

SSe 

SSe/ DFe 

Total 

abort  —  1 

_ ^ ‘-Total _ 

Table  3.  Analysis  of  Variance  for  “Reduced  three-way  model” 


3-4-2  Method.  Twenty  subjects  were  presented  48  binaural  sounds  through 
a  set  of  headphones.  In  this  forced  choice  experiment,  the  subjects  were  asked  to 
select  the  location  from  which  they  perceived  the  sound  to  come  from.  Subjects 
chose  from  a  display  provided  on  a  computer  screen  by  moving  the  mouse  to  place 
the  cursor  over  the  perceived  location  and  clicking  the  button  (see  figure  47). 

The  display  had  only  24  possible  azimuths  to  choose  from,  and  the  elevation 
angle  is  always  zero.  These  24  locations  corresponded  to  the  24  possible  locations 
of  the  computer  generated  binaural  signals.  The  signal  used  in  each  case  are  either 
digitally  filtered  with  the  AAMRL  HRTF  data  or  RBF  HRTF  data  and  delayed 
appropriately  for  a  given  angle  of  azimuth.  The  filters  were  designed  using  the 
program  ESPS  by  Entropic  Research  lab.  The  design  was  an  FIR  filter  using  the 
weighted  mean  square  error  criterion.  The  weights  corresponded  to  the  AAMRL 
HRTF  or  RBF  HRTF  for  the  93  frequencies  at  the  test  location.  A  separate  filter 
was  designed  for  each  ear  and  for  each  location  tested.  In  either  case,  the  original 
signal  is  the  speech  pattern  for  a  male  utterance  of  the  word  “seventeen.” 

Subjects  were  asked  to  close  their  eyes  during  the  data  collection  phase  of  this 
experiment.  Interaction  effects  between  the  subjects  and  the  two  fixed  factors  (angle 
and  method  of  signal  generation)  in  this  experiment  are  assumed  to  be  zero.  The 
angle  at  which  the  signals  are  designed  to  come  from  are  called  factor  B,  and  has 
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24  levels  (one  for  each  azimuth).  Factor  A  represented  the  method  by  which  the 
binaural  signal  was  generated,  and  has  2  levels  (one  for  AAMRL  HRTF  and  one  for 
RBF  HRTF).  -Factor  C  represented  the  subjects. 

3-4-3  The  Model.  The  model  used  in  experiment  two  is  the  same  as 
described  above  in  experiment  one. 

3.5  Conclusion 

This  chapter  presented  the  methodology  for  training  a  neural  network  to  ap¬ 
proximate  the  HRTF  for  zero  elevation.  It  also  presented  two  experiments  to  verify 
the  methodology.  The  results  of  the  training  and  experiments  will  be  presented  in 
the  following  chapter. 


IV.  Experiment  Results 


4.1  Introduction 

This  chapter  presents  the  results  of  the  two  experiments  conducted  during  this 
thesis  work.  Before  presenting  the  results  of  each  experiment  a  discussion  of  the 
types  of  errors  analyzed  in  the  experiments  is  presented. 

4-2  Types  of  Error 

4-2.1  Algebraic  Error.  Algebraic  error  is  measured  by  taking  the  difference 
between  the  actual  location  of  the  sound  source  and  the  subjects  perceived  location. 
The  difference  is  calculated  so  that  the  results  is  always  between  or  equal  to  ±180°. 
Also  the  sign  of  the  result  indicates  the  direction  the  subjects  response  was  from  the 
actual  location.  Positive  values  indicate  a  clockwise  difference  and  negative  values 
indicate  a  counter-clockwise  difference  (see  figure  48).  For  example,  if  the  actual 
location  is  349°  and  the  subject  responded  with  10°  the  algebraic  error  is  calculated 
as  +21°.  On  the  other  hand,  if  the  actual  location  is  10°  and  the  subject  responded 
with  349°  the  algebraic  error  is  calculated  as  —21°. 

4.2.2  Absolute  Error.  Absolute  error  is  measured  by  taking  the  absolute 
value  of  the  difference  between  the  actual  location  of  the  sound  source  and  the 
subjects  perceived  location  of  the  sound  source.  The  difference  is  calculated  so  that 
the  result  is  always  less  than  or  equal  to  180°.  For  example,  if  the  actual  location 
is  349°  and  the  subject  responded  with  10°  the  absolute  error  is  calculated  as  21°  as 
opposed  to  339°. 

4-2.3  Reversals.  Reversals  are  a  phenomenon  where  the  sound  is  perceived 
flipped  front  to  back  or  back  to  front  from  the  actual  location.  Note  this  is  not 
a  180°  shift,  but  a  symmetric  flip  about  the  horizontal  axis  though  the  ears  (see 
figure  49).  Figure  50  shows  actual  occurrence  of  this  phenomena  from  experiment 
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Perceived  location 


Figure  48.  Algebraic  Error,  positive  values  indicate  a  clockwise  difference  and  neg¬ 
ative  values  indicate  a  counter-clockwise  difference 

one.  The  left  image  shows  a  subject  with  back-front  reversals  with  HRTF  treatment. 
The  right  image  shows  a  different  subject  with  front-back  reversals  without  HRTF 
treatment.  The  notation  has  the  following  meaning.  The  x  symbol  indicates  actual 
location  while  the  o  symbol  indicates  perceived  location.  The  line  between  the  x 
and  o  shows  the  pairwise  groupings.  The  gradual  spacing  toward  the  center  of  the 
head  is  for  clarity  only  and  is  not  an  indication  of  perceived  distance. 

4-2.4  Default  to  903 .  Default  to  90°  is  a  phenomena  where  the  perceived 
location  defaults  to  90°  or  270°  no  matter  where  the  actual  location  is.  This  phe¬ 
nomena  has  been  observed  by  other  researchers  (17).  Figure  51  is  an  example  of  a 
subject  whose  perceived  locations  defaulted  to  90°  and  270°. 

4-3  Experiment  One  Results 

As  stated  previously,  the  primary  objective  of  experiment  one  was  to  determine 
whether  or  not  the  Head  Related  Transfer  Function  (HRTF)  provides  any  advantage 
to  a  listener  attempting  to  identify  the  azimuth  of  a  computer  generated  binaural 
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Actual  location 


Perceived  location 

Figure  49.  Reversal,  a  symmetric  flip  about  the  horizontal  axis  through  the  ears 


Figure  50.  Example  of  a  subject  with  back-front  reversals  with  HRTF  treatment 
(left)  and  a  different  subject  with  front-back  reversals  without  HRTF 
treatment  (right).  Notation:  x  indicates  actual  location,  o  indicate 
perceived  location.  The  line  between  the  x  and  o  shows  the  pairwise 
groupings.  The  gradual  spacing  toward  the  center  of  the  head  is  for 
clarity  only  and  is  not  an  indication  of  perceived  distance. 


10  349 


Figure  51.  Example  of  subject  whose  perceived  locations  defaulted  to  90°  and  270°. 

Notation  same  as  above 

signal.  More  specifically,  does  the  use  of  the  HRTF  in  generating  binaural  signals 
provide  a  listener  with  a  signal  that  can  be  more  accurately  located  in  azimuth  than 
a  binaural  signal  generated  using  only  interaural  time  delay  (ITD)?  The  HRTF’s 
used  in  this  experiment  were  developed  at  Armstrong  Aerospace  Medical  Research 
Laboratories  (AAMRL),  Wright-Patterson  AFB,  OH  (12). 

Data  collected  for  20  subjects  was  used  to  determine  the  answer  to  the  question 
posed  above.  Calculations  in  this  statistical  analysis  of  the  data  are  performed  using 
Matlab.  The  data  used  is  found  in  appendix  A.  The  Matlab  code  is  found  in  appendix 
C.  This  experiment  is  a  reduced  three  factor  experiment.  Table  4  summarizes  the 
factors  in  this  experiment. 


Factor 

Levels 

Filter 

2 

Subject 

20 

Azimuth 

24 

Table  4.  Experiment  1  factors 

Where  the  levels  of  “Subject”  represent  each  of  the  twenty  individuals  taking 
part  in  the  experiment,  and  the  levels  of  “Filter”  are  1  and  2  (1  corresponds  to  the 
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use  of  the  HRTF  and  2  corresponds  to  no  use  of  HRTF).  The  levels  of  “Azimuth” 
are  1,  2,  3,  . . 24  and  correspond  to  the  angles  shown  in  table  5.  The  angle  of 


Level 

Angle 

Level 

Angle 

1 

10° 

13 

191° 

2 

32° 

14 

212° 

3 

45° 

15 

225° 

4 

58° 

16 

237° 

5 

69° 

17 

249° 

6 

82° 

18 

262° 

7 

98° 

19 

278° 

8 

1110 

20 

291° 

9 

123° 

21 

303° 

10 

135° 

22 

315° 

11 

148° 

23 

328° 

12 

169° 

24 

349° 

Table  5.  Levels  of  “Azimuth” 


azimuth  is  measured  as  shown  in  figure  52.  Using  the  data  set  shown  in  appendix  A 


Figure  52.  Azimuth,  6  of  sound  source  to  directly  in  front  of  face 


with  “Response”  representing  the  magnitude  of  the  difference  between  the  subjects’ 
perceived  angle  and  the  true  location,  the  following  Analysis  of  Variance  (ANOVA) 
(table  6)  was  obtained. 


61 


Source  of 
Variation 

Degrees  of 
Freedom 

Sum  of 
Squares  ( SS ) 

Mean 

square  (MS) 

Calculated 
F  Value 

F  at 

1%  level 

Filter  (A) 

1 

60554.26 

60554.26 

57.44 

7.47 

6.85,6.63 

7.88 

Azimuth  (B) 

23 

416834.70 

18123.25 

17.19 

2.23 

2.03,1.79 

2.78,2.70 

Subject  (C) 

19 

48451.13 

2550.06 

2.42 

2.19,1.88 

A  *  B 

23 

186224.72 

8096.73 

7.68 

2.03,1.79 

A  *  C 

19 

23134.95 

1217.63 

1.15 

2.19,1.88 

B  *  C 

437 

564293.18 

1291.29 

1.22 

1.53,1.00 

Error 

437 

460732.48 

1054.31 

Total 

959 

1760225.42 

Table  6.  Analysis  of  Variance  for  “Response” 


Recall,  the  following  model  for  this  experiment: 


XijM  —  [i  +  Oii  +  fa  +  7*;  +  (aP)ij  +  (oirfik  +  (/?7  )jk  +  £ijki 

Interaction  effects  between  filters  and  azimuth  must  be  tested.  The  null  and  alternate 
hypotheses  are  (no  interactions  between  filters  and  azimuth)  and  H\  (the  means 
are  not  equal).  The  test  statistic  is  Fcaic  =  (Mean  Square  Interaction)  -r  (Mean 
Square  Random  error)  =  7.68  (as  seen  in  table  6).  Testing  at  an  significance  level  of 
1%  (a  =  .01), 

Fcaic  =  7.68  >  F.99,23,437  ~  2.03, 1.79. 

Hence,  can  be  rejected.  Since  the  hypothesis  Hq1^  for  the  Filter  *  Azimuth 
can  be  rejected,  it  can  be  concluded  that  the  interactions  are  significantly  large. 
Differences  in  the  factors  are  then  significant  only  if  they  are  large  compared  with 
the  interactions.  For  this  reason,  many  statisticians  recommend  that  Hq4'  and 
be  tested  by  using  the  F  ratios  MSa/MSab  and  MSb/MSab  rather  than  those 
given  in  table  3  (27).  These  alternate  F  values  are  the  second  values  shown  in 
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table  6.  Testing  Hq4'1  at  an  significance  level  of  1%, 

FcalcaH  =7.47  <  F.99,1,23  =  7.88. 

Testing  at  an  significance  level  of  1%, 

Fcalcait  =  2.23  <  7^,99,23,23  ~  2.78,  2.70. 

Therefore  the  means  of  the  filters  are  statistically  equal  and  the  means  of  the  az¬ 
imuths  are  statistically  equal. 

Next  completing  the  same  tests  for  Filter  *  Subject  interactions.  Testing  at  an 
significance  level  of  1%  ( a  =  .01), 

F calc  =  1-15  <  7^.99,19,437  ~  2.19,  1.88 

shows  that  cannot  be  rejected.  It  can  be  concluded  that  there  is  no  Filter  * 
Subject  interactions  and  and  Hq6"'  can  be  tested.  Testing  at  an  significance 
level  of  1%, 

Fcalc  =  57.44  >  7^.99,1,437  ~  6.63. 

Testing  H ^  at  an  significance  level  of  1%, 

Fcalc  =  2.42  >  7^.99,19,480  ~  1.91. 

Therefore  the  means  of  the  Filters  nor  the  means  of  the  Subjects  are  statistically 
equal. 

Finally  completing  the  same  tests  for  Azimuth  *  Subject  interactions.  Testing 
at  an  significance  level  of  1%  (a  =  .01), 

Fcalc  =  1.22  >  7^.99,437,437  ~  1-53,  1.00 
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shows  that  cannot  be  rejected.  Testing  H^'  at  an  significance  level  of  1%, 


Fcalc  —  17.19  >  F 99,23,437  ~  2.03,  1.79. 


Testing  Hq6^ 


at  an  significance  level  of  1%, 


Fcalc  —  2.42  >  F 99^9^37  ~  2.19, 1.88. 


Therefore  the  means  of  the  azimuth  nor  the  means  of  the  subjects  are  statistically 
equal. 


Using  the  reversal  corrections,  the  following  Analysis  of  Variance  (ANOVA) 
(table  7)  was  obtained.  If  the  reversal  location  error  was  smaller  than  the  sub¬ 
jects’  response  then  the  reversal  angle  was  used  in  table  7.  The  number  of  reversals 
corrected  was  432  out  of  960  or  45%. 


Source  of 
Variation 

Degrees  of 
Freedom 

Sum  of 
Squares  (SS) 

Mean 

square  (MS) 

Calculated 
F  Value 

F  at 

1%  level 

Filter  (A) 

1 

1811.7 

1811.77 

7.45 

6.85,6.63 

3.28 

8.18 

Azimuth  (B) 

23 

31188.33 

1356.01 

5.58 

2.03,1.79 

Subject  (C) 

19 

21444.68 

1128.67 

4.64 

2.19,1.88 

2.04 

A  *  B 

23 

7804.46 

339.32 

1.40 

2.03,1.79 

A  *  C 

19 

10498.02 

552.53 

2.27 

2.19,1.88 

B  *  C 

437 

137136.70 

313.81 

1.29 

1.53,1.00 

Error 

437. 

106269.46 

243.18 

Total 

959 

316153.40 

Table  7.  Analysis  of  Variance  for  “Response”  with  reversal  corrections 


From  table  7  the  calculated  F  values  indicate  significant  interaction  between 
Filter  *  Subject.  The  other  interactions  Filter  *  Azimuth  and  Azimuth  *  Subject 
indicate  no  significant  interaction.  Comparing  within  factor  variance,  the  means 
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of  the  filters  are  statistically  equal;  the  means  of  the  azimuths  not  are  statistically 
equal;  and  the  means  of  the  subjects  are  statistically  equal. 

Now,  the  main  objective  of  this  experiment  is  addressed.  Is  there  a  difference 
in  the  mean  response  for  HRTF  versus  non-HRTF  binaural  signals?  To  determine 
the  answer  to  this  question,  a  pairwise  comparison  of  the  means  for  the  two  levels  of 
the  variable  “Filter”  is  conducted.  The  following  formula  is  used  to  determine  the 
confidence  intervals  for  the  differences  of  two  population  means: 


X2  —  Xi  ±  zcUx 2-X\  —  X2  —  X\  ± 


(37) 


where, 


X\  is  the  mean  of  the  with  HRTF  filter  error. 

X2  is  the  mean  of  the  without  HRTF  filter  error. 

<y  1  is  the  standard  deviation  of  the  with  HRTF  filter  error. 

02  is  the  standard  deviation  of  the  without  HRTF  filter  error. 
zc  is  the  99%  confidence  coefficient  ( zc  =  2.58). 

N\  is  the  size  of  the  with  HRTF  filter  error. 

N2  is  the  size  of  the  without  HRTF  filter  error. 


And  since  the  point  estimates  for  the  means  at  each  level  are: 


Xi 

=  35.73 

X2 

=  51.61 

X2-X1 

=  15.88 

We  can  state  with  99  percent  confidence  that  : 


8.87  <  X2  -  Xx  <  22.90 
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And  since  the  point  estimates  for  the  means  at  each  level  with  reversal  corrections 


are: 

X1  =  20.12 
X2  =  17.37 
X2-Xi  =  -2.75 

We  can  state  with  99  percent  confidence  that  : 


-5.76  <  X2  -  Xi  <  0.27 

The  confidence  interval  shows  a  8.87°  to  22.90°  advantage  of  using  the  HRTF  versus 
no  HRTF.  After  reversal  correction  the  advantage  is  reduced  to  0.27°  to  a  disadvan¬ 
tage  of  5.76°.  It  should  be  obvious  that  the  correction  for  reversals  will  reduce  the 
error.  The  algorithm  used  to  include  reversals  chooses  the  smaller  error.  The  mean 
with  HRTF  error  reduced  from  35.73°  to  20.12°  and  the  mean  without  HRTF  error 
reduced  from  51.61°  to  17.37°.  The  interaction  of  reversals  is  examined  further  in 
the  next  section. 

4.. 3.1  Further  examination  of  results.  The  above  results  show  mixed  results 
of  the  statistical  advantage  in  localization  accuracy  over  simply  ITD  with  significant 
interaction  between  the  location  of  the  sound  and  whether  the  HRTF  is  used. 

Figure  53  shows  the  absolute  mean  error  and  standard  deviation  for  sounds 
with  the  HRTF  (solid)  and  without  the  HRTF  (dash).  The  error  bars  (vertical  lines) 
show  the  standard  deviation  of  the  mean  error.  This  figure  reveals  that  the  without 
HRTF  filter  had  greater  error  for  presentation  in  front  of  the  head  when  compared  to 
behind  the  head.  Figure  54  shows  the  algebraic  mean  error  and  standard  deviation 
for  sounds  with  the  HRTF  (solid)  and  without  the  HRTF  (dash).  Again  the  error 
bars  show  the  standard  deviation  of  the  mean  error.  The  algebraic  error  gives  a 
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Figure  53.  Absolute  mean  error  and  standard  deviation  for  sounds  with  the  HRTF 
(solid)  and  without  the  HRTF  (dash).  The  error  bars  show  the  standard 
deviation  of  the  mean  error. 


Figure  54.  Algebraic  mean  error  and  standard  deviation  for  sounds  with  the  HRTF 
(solid)  and  without  the  HRTF  (dash).  The  error  bars  show  the  standard 
deviation  of  the  mean  error. 
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direction  to  the  error.  As  can  be  seen  from  the  figure,  the  sign  of  the  error  indicates 
a  general  pulling  of  the  perceived  sound  toward  the  ears.  Figure  55  is  perhaps  a 
better  view  of  the  data.  In  this  figure  the  *  shows  the  mean  location  of  the  perceived 
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Figure  55.  Mean  and  standard  deviation  for  sounds  with  the  HRTF  (left)  and  with¬ 
out  the  HRTF  (right).  The  arcs  show  the  standard  deviation  of  the  mean 
(shown  as  a  *). 


sounds  and  the  arcs  show  the  standard  deviation.  As  can  be  seen,  the  mean  location 
tends  to  be  pulled  near  the  90°  and  270°  locations.  Figure  55  also  shows  how  the 
without  HRTF  treatment  is  perceived  more  behind  the  head  than  in  front  of  the  head. 
This  parallels  the  results  of  Oldfield  and  Parker  where  the  pinna-filled  perceptions 
were  pulled  toward  the  ears  (18). 

To  further  present  the  data  in  more  detail,  appendix  A  has  figures  65,  66,  67 
and  68  depicting  all  the  subjects  responses  for  each  actual  location.  The  notation 
used  in  these  figures  is  as  follows:  x  and  solid  line  indicates  with  HRTF,  o  and 
dashed  line  indicates  without  HRTF.  The  arcs  show  the  standard  deviation  of  the 
samples.  The  *  and  -1-  indicate  the  mean  with  and  without  HRTF  respectively. 
These  are  the  actual  responses.  No  corrections  have  been  made  concerning  reversal. 
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In  an  effort  to  take  into  account  reversals,  polar  histograms  of  the  data  are 
presented  next.  The  polar  histogram  shows  the  frequency  of  subjects’  responses  for 
a  specific  angle  of  azimuth.  Figure  56  shows  the  polar  histogram  of  all  responses  for 
sounds  with  the  HRTF  (solid)  and  without  the  HRTF  (dash).  The  dotted  circle  in¬ 
dicates  the  number  of  times  the  sounds  were  actual  at  that  location.  Notice  subjects 


Figure  56.  Polar  Histogram  of  all  responses  for  sounds  with  the  HRTF  (solid)  and 
without  the  HRTF  (dash). 

heard  sound  at  some  locations  more  times  than  the  sound  was  actual  there.  Recall 
that  each  subject  heard  48  sounds,  one  with  HRTF  and  one  without  HRTF  for  each 
of  the  24  azimuth  locations  for  a  total  of  960  sounds.  This  figure  again  indicates  that 
subjects  tended  not  to  perceive  sounds  in  front  or  behind  the  head.  However  subjects 
perceive  less  without  the  HRTF  sounds  in  front  of  the  head  and  more  without  the 
HRTF  sound  behind  the  head  than  the  with  HRTF  sound  respectively. 

Tables  8  and  9  show  statistics  for  the  with  HRTF  and  without  HRTF  filters. 

Without  taking  into  account  reversals  corrections  41.25%  and  34.58%  of  the  with 
HRTF  and  without  HRTF  responses  respectively  were  correct  within  one  button 
(±23°).  Recall  the  button  layout  is  shown  in  figure  47.  Figure  57  is  a  polar  histogram 
of  all  responses  correct  within  one  button.  This  figure  shows  without  HRTF  correct 
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with  HRTF 

Without  HRTF 

Number  of  samples 

480 

480 

rmserr 

0.4500 

5.9583 

stderr 

50.7709 

70.8760 

correct  -  exact 

16.88% 

14.37% 

correct  -  within  one  button  (23°) 

41.25% 

34.58% 

Table  8.  Without  reversal  corrections  (nor) 


with  HRTF 

Without  HRTF 

Number  of  samples 

480 

480 

rmserr 

0.9375 

0.5458 

stderr 

27.6543 

24.8911 

Number  of  reversals 

176 

259 

correct  -  exact 

24.70% 

28.75% 

correct  -  within  one  button  (23°) 

57.29% 

66.87% 

Table  9.  With  reversal  corrections  (rev) 


responses  are  greater  behind  the  head  while  the  with  HRTF  correct  response  are 
more  evenly  distributed  front  and  back. 

Next  taking  into  account  reversals  corrections  57.29%  and  66.87%  of  the  with 
HRTF  and  without  HRTF  responses  respectively  were  correct  within  one  button 
(±23°).  Figure  58  is  a  polar  histogram  of  all  responses  with  reversal  corrections  that 
are  correct  within  one  button.  This  figure  shows  without  HRTF  correct  responses 
and  the  with  HRTF  correct  responses  are  more  evenly  distributed  front  and  back. 
The  percentage  of  reversals  with  the  HRTF  and  without  the  HRTF  was  36.67% 
and  53.96%  respectively.  Oldfield  and  Parker  found  few  reversals  with  the  pinna 
unfilled  and  26%  with  the  pinna  filled  (17,  18).  The  relative  increase  in  the  number 
of  reversals  indicates  that  the  HRTF  helps  in  reducing  the  number  of  reversals.  The 
differences  between  Oldfield  and  Parker  and  this  thesis  also  indicate  that  there  may 
be  cues  in  the  actual  sound  that  are  lost  in  the  filtered  sound.  Note:  Oldfield  and 
Parker  used  white  noise  while  this  thesis  used  filtered  speech. 
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Figure  57. 


Figure  58. 
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Polar  Histogram  of  all  responses  correct  within  one  button  for  sounds 
with  the  HRTF  (solid)  and  without  the  HRTF  (dash). 
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Polar  Histogram  of  reversal  corrected  responses  correct  within  one  but¬ 
ton  for  sounds  with  the  HRTF  (solid)  and  without  the  HRTF  (dash). 


71 


4-3.2  Summary  of  Experiment  One  Results.  Table  10  shows  a  summary  of 
the  ANOVA  hypotheses  acceptances  and  rejections.  An  “accept”  means  the  means 


Source  of 
Variation 

No  Reversals 
Hypothesis 

Reversals 

Hypothesis 

Filter  (A) 

reject 

accept 

Azimuth  (B) 

Subject  (C) 

reject 

reject 

accept 

A  *  B 

reject 

accept 

A  *  C 

accept 

reject 

B  *  C 

accept 

accept 

Table  10.  Summary  of  Experiment  One 


are  statistically  equal,  while  a  “reject”  mean  the  difference  between  the  means  is 
statistically  large.  Depending  on  the  statistical  method  used,  we  can  conclude  that 
the  HRTF  does  provide  a  statistical  advantage  in  localization  accuracy  over  simply 
ITD.  There  is  however  a  statistically  significant  interaction  between  the  location 
of  the  sound  and  whether  the  HRTF  is  used.  When  this  interaction  is  removed 
using  the  alternate  F- Value,  the  statistics  give  mixed  results  of  equal  means  for  the 
filters  and  azimuth.  This  may  mean  that  at  certain  angles  of  azimuth,  the  HRTF 
either  provides  no  advantage  at  all  or  hinders  localization  capabilities.  Adding  in 
the  calculations  for  reversals  changes  the  results  to  where  the  means  of  the  azimuth 
are  not  statistically  equal.  The  reversal  calculations  will  inherently  reduce  the  error 
results.  However  comparing  the  number  of  reversals  indicates  an  advantage  of  using 
the  HRTF  over  no  HRTF. 


4-4  Experiment  Two  Results 

The  primary  objective  of  experiment  two  was  to  determine  if  a  neural  network 
could  be  trained  to  learn  the  HRTF. 
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Source  of 
Variation 

Degrees  of 
Freedom 

Sum  of 
Squares  ( SS ) 

Mean 

square  (MS) 

Calculated 
F  Value 

F  at 

1%  level 

Filter  (A) 

1 

7578.07 

7578.07 

8.58 

6.85,6.63 

Azimuth  (B) 

23 

287455.55 

12498.07 

14.15 

8.39 

2.03,1.79 

2.03,1.79 

Subject  (C) 

19 

39938.99 

2102.05 

2.38 

1.41 

2.19,1.88 

2.19,1.88 

A  *  B 

23 

27216.78 

1183.34 

1.34 

2.03,1.79 

A  *  C 

19 

19117.75 

1006.20 

1.14 

2.19,1.88 

B  *  C 

437 

650785.39 

1489.21 

1.69 

Error 

437 

385933.34 

883.14 

Total 

959 

1418025.87 

Table  11.  Analysis  of  Variance  for  “Response” 


Recall,  the  following  model  for  this  experiment: 


Xijki  =  H  +  Oii  +  (3j  +  7fc  +  ( ac(3)ij  +  (wy)ik  +  (/?7  )jk  +  £ijki 


Interaction  effects  between  filters  and  azimuth  must  be  tested.  The  null  and  alternate 
hypotheses  are  Hq1^  (no  interactions  between  filters  and  azimuth)  and  Hi  (The  means 
are  not  equal).  The  test  statistic  is  Fcaic  —  (Mean  Square  Interaction)  -f  (Mean 
Square  Random  error)  =  1.34  (as  seen  in  table  11).  Testing  at  an  significance  level 
of  1%  (a  =  .01), 

Fcaic  =  1.34  <  F 99,23,437  ~  2.03,  1.79. 


Hence,  Hq1'*  cannot  be  rejected,  it  can  be  concluded  that  there  is  no  Filter  *  Azimuth 
interactions  and  H ^  and  H(0b>  can  be  tested.  Testing  at  an  significance  level 


of  1%, 


Fcaic  —  8-58  >  F, 99,1,437  ~  6.85,  6.63. 


Testing  at  an  significance  level  of  1%, 


Fcaic  —  14.15  >  F 99,23,437  ~  2.03,  1.79. 


73 


Therefore  the  means  of  the  filters  nor  the  means  of  the  azimuths  are  statistically 
equal. 

Next  completing  the  same  tests  for  Filter  *  Subject  interactions.  Testing  at  an 
significance  level  of  1%  (ct  =  .01), 

Fcalc  =  1-14  <  ^.99^9,437  ~  2.19,  1.88 

shows  that  H ^  cannot  be  rejected.  It  can  be  concluded  that  there  is  no  Filter 
*  Subject  interactions  and  and  can  then  be  tested.  Testing  at  an 
significance  level  of  1%,  Fcaic  =  8.58  >  F(. 99, 1,437)  «  6.85,6.63.  Testing  at 
an  significance  level  of  1%, 

Fcalc  —  2-38  >  F.99^9,437  «  2.19, 1.88. 

Therefore  the  means  of  the  filters  nor  the  means  of  the  subjects  are  statistically 
equal. 

Finally,  completing  the  same  tests  for  Azimuth  *  Subject  interactions.  Testing 
at  an  significance  level  of  1%  (a  =  .01), 

Fcalc  =  1-69  >  F.99^37^37  ~  1.53,  1.00 

shows  that  can  be  rejected.  Differences  in  the  factors  are  then  significant  only 
if  they  are  large  compared  with  the  interactions.  Again  the  alternate  F  values  are 

(5) 

used.  These  alternate  F  values  are  the  second  values  shown  in  table  6.  Testing  Hq 
at  an  significance  level  of  1%, 

Fcalcalt  =  8-39  >  F.99^3,437  ~  2.03, 1.79. 
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Testing  H ^  at  an  significance  level  of  1%, 


FcalCait  —  1-41  <  ^.99, 19,437  ~  2.19,  1.88. 


Therefore  the  means  of  the  azimuths  are  not  statistically  equal,  however,  the  means 
of  the  subjects  are  statistically  equal. 

Using  the  reversal  corrections,  the  following  Analysis  of  Variance  (ANOVA) 
(table  12)  was  obtained.  If  the  reversal  location  error  was  smaller  than  the  subjects’ 
response  then  the  reversal  angle  was  used  in  table  12.  The  number  of  reversal 
corrections  was  351  out  of  960  or  37%. 


Source  of 

Degrees  of 

Mean 

Calculated 

F  at 

Variation 

Freedom 

square  (MS) 

F  Value 

1%  level 

Filter  (A) 

1 

1249.46 

9.23 

6.85,6.63 

3.94 

7.88 

Azimuth  (B) 

23 

16741.34 

727.88 

5.38 

2.29 

Subject  (C) 

19 

6288.65 

330.98 

2.45 

2.19,1.88 

A  *  B 

23 

7303.92 

317.56 

2.35 

2.03,1.79 

A  *  C 

19 

3282.8 

172.78 

1.28 

2.19,1.88 

B  *  C 

437 

89632.50 

205.11 

1.52 

1.53,1.00 

Error 

437 

59127.73 

135.30 

Total 

959 

183626.43 

Table  12.  Analysis  of  Variance  for  “Response”  with  reversal  corrections 


From  table  12  the  calculated  F  values  indicate  significant  interaction  between 
Filter  *  Azimuth.  The  other  interactions  Filter  *  Subject  and  Azimuth  *  Subject 
indicate  no  significant  interaction.  Comparing  the  within  factor  variance  the  means 
of  filters  are  statistically  equal;  the  means  of  the  azimuths  are  statistically  equal; 
and  the  means  of  the  subjects  are  not  statistically  equal. 

Now,  the  main  objective  of  this  experiment  are  addressed.  Is  there  a  difference 
in  the  mean  response  for  AAMRL  HRTF  versus  RBF  HRTF  binaural  signals?  To 
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determine  the  answer  to  this  question,  a  pairwise  comparison  of  the  means  for  the 
two  levels  of  the  variable  “Filter”  are  conducted.  The  following  formula  is  used  to 
determine  the  confidence  intervals  for  the  differences  of  two  population  means: 


X2  —  Xx  ±  zc/Jx2-x  1  —  X2  —  X\  ± 


(38) 


where, 


Xi  is  the  mean  of  the  AAMRL  HRTF  filter  error. 

X2  is  the  mean  of  the  RBF  HRTF  filter  error, 
c T\  is  the  standard  deviation  of  the  AAMRL  HRTF  filter  error. 
(72  is  the  standard  deviation  of  the  RBF  HRTF  filter  error. 

2C  is  the  99%  confidence  coefficient  ( zc  =  2.58). 

Ni  is  the  size  of  the  AAMRL  HRTF  filter  error. 

N2  is  the  size  of  the  RBF  HRTF  filter  error. 


And  since  the  point  estimates  for  the  means  at  each  level  are: 


Xi  =  38.28 
X2  =  32.66 


X2-Xi  =  -5.62 


We  can  state  with  99  percent  confidence  that  : 


-12.01  <X2-Xx<  0.77 


And  since  the  point  estimates  for  the  means  at  each  level  with  reversal  corrections 
are: 


Xi  =  17.18 
X2  =  14.90 


X2  -  Xi  =  -2.28 
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We  can  state  with  99  percent  confidence  that  : 


-4.58  <  X2  -  Xx  <  0.02 

The  confidence  interval  shows  a  0.77°  advantage  to  a  12.01°  disadvantage  of  using 
the  AAMRL  HRTF  versus  RBF  HRTF.  After  reversal  correction  the  advantage  is 
reduced  to  0.02°  to  a  disadvantage  of  4.58°.  Again  the  correction  for  reversals  reduced 
the  error.  The  mean  AAMRL  HRTF  error  reduced  from  38.28°  to  17.18°  and  the 
mean  RBF  HRTF  error  reduced  from  32.66°  to  14.90°.  The  interaction  of  reversals 
is  examined  further  in  the  next  section. 

4-4-1  Further  examination  of  results.  Figure  53  shows  the  absolute  mean 
error  and  standard  deviation  for  sounds  with  the  AAMRL  HRTF  (solid)  and  RBF 
HRTF  (dash).  The  error  bars  show  the  standard  deviation  of  the  mean  error.  This 


Figure  59.  Absolute'  mean  error  and  standard  deviation  for  sounds  with  the 
AAMRL  HRTF  (solid)  and  RBF  HRTF  (dash).  The  error  bars  show 
the  standard  deviation  of  the  mean  error. 

figure  reveals  that  AAMRL  HRTF  had  slightly  greater  error  than  the  RBF  HRTF.  A 
comparison  of  the  standard  deviations  reveals  no  significant  trends.  Figure  60  shows 
the  algebraic  mean  error  and  standard  deviation  for  sounds  with  the  AAMRL  HRTF 
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(solid)  and  RBF  HRTF  (dash).  The  error  bars  show  the  standard  deviation  of  the 
mean  error.  The  algebraic  error  gives  a  direction  to  the  error.  As  can  be  seen  from 


Figure  60.  Algebraic  mean  error  and  standard  deviation  for  sounds  with  the 
AAMRL  HRTF  (solid)  and  RBF  HRTF  (dash).  The  error  bars  show 
the  standard  deviation  of  the  mean  error. 

the  figure,  the  sign  of  the  error  indicates  a  general  pulling  of  the  perceived  sound 
toward  the  ears.  Figure  55  is  perhaps  a  better  view  of  the  data.  In  this  figure  the 

*  shows  the  mean  location  of  the  perceived  sounds  and  the  arcs  show  the  standard 
deviation.  As  can  be  seen,  the  mean  location  tends  to  be  pulled  near  the  90°  and 
270°  locations.  This  parallels  the  results  of  experiment  one  and  Oldfield  and  Parker 
where  the  pinna-filled  perceptions  were  pulled  toward  the  ears  (18).  Begault  and 
Wenzle  also  reported  subjects  had  a  pattern  of  pulling  toward  the  vertical-lateral 
plane  (3). 

To  present  the  data  in  more  detail,  appendix  B  has  figures  69,  70,  71  and  72 
depicting  all  the  subjects’  responses  for  each  actual  location.  The  notation  used  in 
these  figures  is  as  follows:  x  and  solid  line  indicates  AAMRL  HRTF,  o  and  dashed 
line  indicate  RBF  HRTF.  The  arcs  show  the  standard  deviation  of  the  samples.  The 

*  and  +  indicates  the  mean  AAMRL  and  RBF  HRTF  respectively.  These  are  the 
actual  responses.  No  corrections  have  been  made  concerning  reversal. 
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Figure  61.  Mean  and  standard  deviation  for  sounds  with  the  AAMRL  HRTF  (left) 
and  RBF  HRTF  (right).  The  arcs  show  the  standard  deviation  of  the 
mean  (shown  as  a  *). 


In  an  effort  to  take  into  account  reversals,  polar  histograms  of  the  data  are 
presented  next.  The  polar  histogram  shows  the  frequency  of  subjects’  responses  for 
a  specific  angle  of  azimuth.  Figure  62  shows  the  polar  histogram  of  all  responses  for 
sounds  with  the  AAMRL  HRTF  (solid)  and  the  RBF  HRTF  (dash).  The  AAMRL 
HRTFs  are  label  “Actual  HRTF”  in  the  figure  legend.  The  dotted  circle  indicates  the 
number  times  the  sounds  was  actual  at  that  location.  Again  the  responses  between 
the  AAMRL  HRTF  and  RBF  HRTF  are  similar  with  the  larger  number  of  responses 
occurring  on  the  sides. 

Tables  13  and  14.  show  statistics  for  the  AAMRL  HRTF  and  RBF  HRTF 
filters.  Without  taking  into  account  reversals  corrections  40.62%  and  51.88%  of 
the  AAMRL  HRTF  and  RBF  HRTF  responses  respectively  were  correct  within  one 
button  (±23°).  Figure  63  is  a  polar  histogram  of  all  responses  correct  within  one 
button.  This  figure  shows  a  even  distribution  between  filters.  The  front  and  back 
distribution  may  slightly  favor  the  back. 


79 


Figure  62.  Polar  Histogram  of  all  responses  for  sounds  with  the  AAMRL  HRTF 
(solid)  and  RBF  HRTF  (dash). 


AAMRL  HRTF 

RBF  HRTF 

Number  of  samples 

480 

480 

rmserr 

1.6354 

0.0521 

stderr 

54.2080 

50.4625 

correct  -  exact 

15.42% 

22.08% 

correct  -  within  one  button  (23°) 

40.62% 

51.88% 

Table  13.  Without  reversal  corrections  (nor) 


AAMRL  HRTF 

RBF  HRTF 

Number  of  samples 

480 

480 

rmserr 

0.1146 

0.3312 

stderr 

22.5449 

19.9285 

Number  of  reversals 

188 

163 

correct  -  exact 

25.21% 

30.63% 

correct  -  within  one  button  (23°) 

62.71% 

70.42% 

Table  14.  With  reversal  corrections  (rev) 


80 


10 


349 


32  *  ” .  328 

45  '■  -..  315 

58  /  ’  '  -,303 


Figure  63.  Polar  Histogram  of  all  responses  correct  within  one  button  for  sounds 
with  the  AAMRL  HRTF  (solid)  and  RBF  HRTF  (dash). 

Next,  taking  into  account  reversals  corrections  62.71%  and  70.42%  of  the 
AAMRL  HRTF  and  RBF  HRTF  responses  respectively  were  correct  within  one  but¬ 
ton  (±23°).  Figure  64  is  a  polar  histogram  of  all  responses  with  reversal  corrections 
that  are  correct  within  one  button.  This  figure  shows  there  are  more  AAMRL  HRTF 
correct  responses  and  the  RBF  HRTF  correct  responses  and  they  are  more  evenly 
distributed  front  and  back.  The  percentage  of  reversals  with  the  AAMRL  HRTF  and 
the  RBF  HRTF  was  39.17%  and  33.96%  respectively.  Begault  and  Wenzle  reported 
the  mean  number  of  reversals  over  all  the  locations  was  29%  (3).  The  results  of 
experiment  one  showed  the  AAMRL  HRTF  reversals  to  be  36.67%. 

4-4-2  Summary  of  Experiment  Two  Results.  Table  15  shows  a  summary  of 
the  ANOVA  hypotheses  acceptances  and  rejections.  An  “accept”  means  the  means 
are  statistically  equal,  while  a  “reject”  mean  the  difference  between  the  means  is 
statistically  large.  Without  considering  reversals,  we  can  conclude  that  the  RBF 
HRTF  provides  a  statistical  advantage  in  localization  accuracy  over  the  AAMRL 
HRTF  from  which  they  were  derived.  However  with  reversals  correction  include  a 
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Figure  64.  Polar  Histogram  of  reversal  corrected  responses  correct  within  one  but¬ 
ton  for  sounds  with  the  AAMRL  HRTF  (solid)  and  RBF  HRTF  (dash). 


Source  of 
Variation 

No  Reversals 
Hypothesis 

Reversals 

Hypothesis 

Filter  (A) 

reject 

reject 

accept 

■ ■ 

Subject  (C) 

reject 

accept 

accept 

A  *  B 

accept 

reject 

A  *  C 

accept 

accept 

B  *  C 

reject 

accept 

Table  15.  Summary  of  Experiment  Two 
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statistical  advantage  is  not  indicated  and  the  means  of  the  two  filters  are  statisti¬ 
cally  equal.  There  is  also  a  statistically  significant  interaction  between  the  location 
of  the  sound  and  the  filter.  As  discussed  in  experiment  one,  this  may  mean  that 
at  certain  angles  of  azimuth,  the  HRTF  either  provides  no  advantage  at  all  or  hin¬ 
ders  localization  capabilities.  The  reversal  calculations  will  inherently  reduce  the 
error  results.  However,  comparing  the  number  of  reversals  does  not  indicate  a  large 
difference  between  the  AAMRL  HRTF  and  the  RBF  HRTF. 
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V.  Conclusions  and  Recommendations 

5.1  Summary 

This  thesis  determined  whether  an  artificial  neural  network  (ANN)  can  approx¬ 
imate  the  Armstrong  Aerospace  Medical  Research  Laboratories  (AAMRL)  head  re¬ 
lated  transfer  functions  (HRTF)  data.  In  order  to  test  this  hypothesis,  two  separate 
tests  were  performed.  The  first  test  determined  whether  HRTF  lends  any  support  in 
sound  localization  when  compared  to  no  HRTF  (Interaural  Time  Delay  only).  The 
second  test  determined  whether  HRTF  and  ANN  lend  the  same  amount  of  support 
in  sound  localization  when  compared  to  each  other. 

The  following  research  objectives  were  met: 

•  Modified  AFIT  algorithms  that  have  been  used  successfully  on  placing  sounds 
in  azimuth.  The  algorithms  that  create  3D  sound  with  the  AAMRL  HRTF 
were  changed  to  used  the  ANN  HRTF  instead.  This  allowed  the  creation  of 
tests  to  compare  the  ANN  HRTF  with  the  AAMRL  HRTF. 

•  Statistically  checked  to  see  advantage  of  HRTF  and  ITD  versus  ITD  only. 

•  Statistically  checked  to  see  difference  between  AAMRL  HRTF  data  and  ANN 
HRTF  data 

•  Investigated  the  program  LNKmap  as  a  neural  network  tool  for  function  ap¬ 
proximation. 

5.2  Conclusions 

The  final  conclusion  is  that  the  AAMRL  HRTFs  can  be  approximated  by 
an  artificial  neural  network  for  azimuth  positions  at  zero  degrees  elevation.  The 
point  estimates  for  the  difference  in  the  AAMRL  and  RBF  HRTF’s  mean  error  was 
only  5.62°  and  2.28°  without  and  with  reversal  corrections  respectively.  The  point 
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estimates  indicate  that  the  RBF  HRTF  had  slightly  lower  error  than  the  AAMRL 
HRTF.  However  the  99%  confidence  interval  of  the  mean  error  shows  no  advantage 
one  way  or  the  other. 

To  show  that  the  HRTF  was  in  fact  contributing  to  sound  localization,  experi¬ 
ment  one  was  conducted.  The  results  indicate  an  advantage  in  localization  accuracy 
with  HRTF.  There  is  a  statistically  significant  interaction  between  the  location  of 
the  sound  and  whether  the  HRTF  is  used.  This  result  may  indicate  that  for  the  0° 
elevations  ITD  has  a  strong  influence.  Comparing  the  numbers  of  reversals  clearly 
indicates  an  advantage  of  using  the  HRTF  over  no  HRTF.  It  is  important  to  note  that 
the  3D  sounds  presented  in  the  experiments  were  the  direct  paths  only.  Smith  showed 
an  improvement  in  results  when  attenuated  reflections  were  also  included  (25). 

5.3  Recommendations  for  Future  Research 

The  obvious  next  step  is  to  expand  the  neural  network  training  the  other 
elevations  of  the  AAMRL  data.  The  AAMRL  data  has  HRTF  responses  for  elevation 
±90°.  Oldfield  and  Parker  found  that  the  acuity  of  sound  localization  varied  with 
azimuth  and  elevation  (17).  The  ANN  HRTF  would  have  to  be  tested  in  these 
non-zero  degrees  elevation  regions. 

The  AAMRL  HRTF  0°  elevation  data  in  this  thesis  was  spaced  approximately 
23°  apart.  Future  research  could  investigate  the  interpolations  accuracy  of  the  ANN 
against  the  AAMRL  HRTF  data  spaced  1°  apart  at  0°  elevation. 

Replacing  the  AAMRL  HRTF  samples  by  an  artificial  neural  network  approx¬ 
imator  may  also  provide  insight  of  underlining  functions  within  the  HRTF.  Since 
subjects  where  able  to  localize  with  the  smoothed  ANN  HRTF,  simpler  filter  designs 
may  be  implemented. 

Implementing  the  ANN  HRTF  in  hardware  may  provide  an  alternate  approach 
to  real-time  implementation  other  than  the  DIRAD  and  Convolvotron. 
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Appendix  A.  Experiment  1  Data:  With  and  Without  HRTF 
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Figure  65.  All  subjects’  responses  for  actual  locations,  10°,  32°,  45°,  58°,  69°  and 
82°  shown  left  to  right,  top  to  bottom  respectively.  Notation:  x  and 
solid  line  indicates  with  HRTF,  o  and  dashed  line  indicate  without 
HRTF.  The  arcs  show  the  standard  deviation  of  the  samples.  The  * 
and  +  indicates  the  mean  with  and  without  HRTF  respectively. 


Figure  67.  All  subjects’  responses  for  actual  locations,  191°,  212°,  225°,  237°,  249° 
and  262°  shown  left  to  right,  top  to  bottom  respectively.  Notation  is 


same  as  above. 


and  349°  shown  left  to  right,  top  to  bottom  respectively.  Notation  is 


same  as  above. 


Appendix  B.  Experiment  2  Data:  AAMRL  HRTF  and  RBF  HRTF 
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Figure  69.  All  subjects’  responses  for  actual  locations,  10°,  32°,  45°,  58°,  69°  and 
82°  shown  left  to  right,  top  to  bottom  respectively.  Notation:  x  and 
solid  line  indicates  with  AAMRL  HRTF,  o  and  dashed  line  indicate  RBF 
HRTF.  The  arcs  show  the  standard  deviation  of  the  samples.  The  *  and 
4-  indicates  the  mean  AAMRL  and  RBF  HRTF  respectively. 
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Figure  70.  All  subjects’  responses  for  actual  locations,  98°,  1110,  123°,  135°,  148° 
and  169°  shown  left  to  right,  top  to  bottom  respectively.  Notation  is 
same  as  above. 
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Figure  71.  All  subjects’  responses  for  actual  locations,  191°,  212°,  225°,  237°,  249° 
and  262°  shown  left  to  right,  top  to  bottom  respectively.  Notation  is 
same  as  above. 
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Figure  72.  All  subjects’  responses  for  actual  locations,  278°,  291°,  303°,  315°,  328° 
and  349°  shown  left  to  right,  top  to  bottom  respectively.  Notation  is 
same  as  above. 
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Appendix  C.  Matlab  M-files 

A  sampling  of  the  graphic  commands  in  Matlab  is  listed  in  table  16. 


Line  and  area  fill  commands. 

3-D  objects. 

plot3 

Plot  lines  and  points  in  3-D. 

cylinder 

Generate  cylinder. 

£1113 

Draw  filled  3-D  polygons  in  3-D. 

sphere 

Generate  sphere. 

comet3 

3-D  comet-like  trajectories. 

Volume  visualization. 

Contour  and  other  2-D  plot 

slice 

Volumetric  visualization  plots. 

of  3-D  data. 

Graph  appearance. 

contour 

Contour  plot. 

view 

3-D  graph  viewpoint 

contour3 

3-D  contour  plot. 

specification. 

clabel 

Contour  plot  elevation  labels. 

viewmtx 

View  transformation  matrices. 

contourc 

Contour  plot  computation  . 

hidden 

Mesh  hidden  line  removal  mode. 

pcolor 

Pseudocolor  (checkerboard)  plot. 

shading 

Color  shading  mode. 

quiver 

Quiver  plot. 

axis 

Axis  scaling  and  appearance. 

Surface  and  mesh  plots. 

caxis 

Pseudocolor  axis  scaling. 

mesh 

3-D  mesh  surface. 

colormap 

Color  look-up  table. 

meshc 

Combination  mesh/contour  plot. 

Graph  annotation. 

meshz 

3-D  Mesh  with  zero  plane. 

Text  annotation. 

surf 

3-D  shaded  surface. 

Graph  title. 

surfc 

Combination  surf/contour  plot. 

xlabel 

X-axis  label. 

surfl 

3-D  shaded  surface  with  lighting. 

y  lab  el 

Y-axis  label. 

waterfall 

Waterfall  plot. 

zlabel 

Z-axis  label  for  3-D  plots. 

Table  16.  Three  dimensional  graphic  commands.  Copyright  (c)  1984-93  by  The 
Math  Works,  Inc. 


C.l  angleresponse.m 

function  [meanerr,  stdall] =angleresponse (allsubjects) ; 

7,  ANGLERESPONSE  Determines  the  mean  error  and  standard  deviation  from  file 
7,  allsubjects  for  each  angle  of  azimuth  in  experiments  one  and  two. 

7.  Written  by  Capt  John  K.  Millhouse 

[m,n]=size (allsubjects) ; 

7.  correct  for  bad  responses 
for  i=l:m, 

for  j=l:n, 

if  allsubjects(i, j)==0, 

allsubjects (i, j)=allsubjects(i,l) ; 

end 

if  abs (allsubjects (i, j) -allsubjects (i,l)) <180, 
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errall(i , j)=allsubjects(i, j ) -allsub jects(i, 1) ; 
else 

errall(i, j)=allsubjects(i , j) -allsub jects(i, l)-360; 
end  - 

end 

end 

az=errall(l:24,l) ; 

'/,  calculate  mean  and  std  and  plot  results 
for  i=l :24, 

meanerr(i,l:3)=mean(errall(i:24:480,l:3)) ; 
stdall (i , 1 : 3)= (std(errall (i : 24 : 480 ,1:3))); 
thetal=zeros (40 ,  1) ; 

thetal (1:2: 40) =allsubjects (i : 24 : 480 , 2) *pi/180 ; 
rhol=zeros (40 , 1) ; 
rhol(l:2:40)=ones(20,l) ; 
theta2=zeros(40,l) ; 

theta2 (1:2: 40) =allsub j  ect s (i : 24 : 480 ,3) *pi/180 ; 
rho2=zeros (40 , 1) ; 
rho2(l:2:40)=.95*ones(20,l) ; 
mypolar (thetal , rhol , ’ g- 5 ) 
hold  on 

mypolar(theta2,rho2, ’r:  ’) 

mypolar (allsub jects(i: 24: 480, 2) *pi/180, ones (20,1) , ’gx’) 
mypolar(allsubjects(i :24:480,3)*pi/180, .95*ones(20,l) , ’ro’) 
mypolar( [0  (allsubjects(i,l)+meanerr(i,2))*pi/180] .... 

[0  1.05] , ’g*’) 

mypolar( [0  (allsubjects(i,l)+meanerr(i,3))*pi/180] .... 

[0  1.05] , ’r*’) 

hold  off 

title ( [’azimuth  =  ’,... 

niam2str (allsubjects (i,l)) ,  ’ ,  ’ , . . . 
num2str(meanerr(i,2)) ,  ’ , 
num2str(meanerr(i,3))] ) 
pause 

end 


C.2  anova.m 

•/.  get  data 

allsub j ect s=allsubjects2; 
trial=l ; 
index=0 ; 

actual=allsubjects( : , 1) ; 
with=allsubjects( : ,2) ; 
without=allsubjects( : ,3) ; 
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’/,  correct  for  bad  responses ..  .make  mean  of  other  subjects 
•/.without  (107) =122. 78; 

‘/.without  (275) =122 . 78 ; 
la=length(actual) ; 
for  i=l:la 

index=index+l ; 
if  index==25,  index=l;  end 

if  with(i)==0,  with(i)=sum(with(index:24:480))/19; 
end 

if  without (i)==0,  without(i)=sum(without(index:24:480))/19; 
end 

end 

for  i=i:la 

if  with(i)<180, 

rwith(i)=180-with(i) ; 
else 

rwith(i)=540-with(i) ; 

end 

if  without (i) <180, 

rwithout(i)=180-without(i) ; 
else 

rwithout(i)=540-without(i)  ; 

end 

end 


witherr=zeros (480 , 1) ; 
withouterr=zeros (480,1) ; 
rwitherr=zeros (480 , 1 ) ; 
rwithouterr=zeros(480, 1) ; 
for  i=l:la, 

witherr(i)=actual(i)-with(i) ; 
withouterr ( i ) =actual ( i ) -without ( i) ; 
rwitherr(i)=actual(i)-rwith(i) ; 
rwithout err ( i ) =actual ( i ) -rwithout ( i ) ; 
if  abs(witherr(i))>180, 

witherr(i)=abs(360  -  abs(witherr(i))) ; 
else 

witherr(i)=abs(witherr(i)) ; 

end 

if  abs (withouterr (i)) >180, 

withouterr (i)=abs (360  -  abs (withouterr (i) )) ; 
else 

withouterr(i)=abs(withouterr(i) ) ; 

end 
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if  abs(rwitherr(i))>180, 

rwitherr(i)=abs(360  -  abs(rwitherr(i))) ; 
else 

rwitherr(i)=abs(rwitherr(i)) ; 

end 

if  abs(rwithouterr(i) )>180, 

rwithouterr(i)=abs(360  -  abs(rwithouterr(i))) ; 
else 

rwithouterr (i)=abs (rwithouterr (i) ) ; 

end 

end 

'/, correct  for  reversal 

revcount=0; 

for  i=l:la, 

if  witherr(i)>rwitherr(i) , 
witherr(i)=rwitherr (i) ; 
revcount=revcount+l ; 

end 

if  withouterr(i)>rwithouterr (i) , 
withouterr(i)=rwithouterr(i) ; 
revcount=revcount+l ; 

end 

end 


*/.  start  anova  7//////////////////////////////////////////////////////.7.7. 

X=[witherr  without err] ; 

a=2 ; 

b=24; 

c=20; 

n=l; 

•/,  get  totals  mmmmmmmmmmx 

Ti=zeros(a, 1) 

Tj=zeros(b, 1) 

Tk=zeros(c,l) 

Tij=zeros(a,b) ; 

Tij2=zeros(a,b) ; 

Tik=zeros(a,c) ; 

Tjk=zeros(b,c) ; 
for  i=l:a 
for  j=l:b 
for  k=l:c 


Ti(i)=Ti(i)+getx(X,i, j ,k) ; 

Tj  (j)=Tj (j)+getx(X,i, j ,k) ; 
Tk(k)=Tk(k)+getx(X,i, j ,k) ; 

Tij (i , j )=Tij (i, j)+getx(X,i, j ,k) ; 
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Tik(i ,k)=Tik(i ,k)+getx(X, i , j ,k) ; 
Tjk( j ,k)=Tjk( j ,k)+getx(X, i , j ,k) ; 

end 

end 

end 


Tijk=X; 

T=sum(sum(X)) ; 

X  get  sum  of  squares  XXXXXXXXXXXXXXXXXXXXXXX 
SSa=0;  SSb=0 ;  SSc=0; 

SSab=0;  SSac=0;  SSbc=0; 

SSt=0;  SSe=0; 
for  i=l:a 

SSa=SSa+  ( (Ti(i) . ~2)/(b*c) ) ; 


end 

for  j=l:b 

SSb=SSb+  Tj(j).*2/(a*c); 

end 

for  k=l : c 


SSc=SSc+  Tk(k).',2/(a*b); 

end 

for  i=l:a 
for  j =1 : b 

SSab=SSab+  Tij (i, j) . ~2/(c) ; 

end 

end 

for  i=l:a 
for  k=l:c 


SSac=SSac+  Tik(i,k) .~2/(b) ; 

end 

end 

for  j=l:b 
for  k=l:c 


SSbc=SSbc+  Tjk(j,k).~2/(a); 

end 

end 

SSab=SSab  -  SSa  -  SSb.+  T. “2/(a*b*c) ; 
SSac=SSac  -  SSa  -  SSc  +  T. ■"2/(a*b*c) ; 
SSbc=SSbc  -  SSb  -  SSc  +  T. “2/(a*b*c) ; 
SSa=SSa-  T. ~2/(a*b*c) ; 

SSb=SSb-  T.“2/(a*b*c); 

SSc=SSc-  T. ~2/(a*b*c) ; 
for  i=l : a 
for  j =1 : b 
for  k=i:c 

SSt=SSt+  getx(X,i, j ,k) .“2; 
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end 

end 

end 

SSt=SSt-  T.“2/(a*b*c) ; 

SSe=  SSt  -  SSa  -  SSb  -  SSc  -  SSab  -  SSac  -  SSbc; 

'/,  mean  square 
MSa=SSa/(a-l) ; 

MSb=SSb/(b-i) ; 

MSc=SSc/(c-l) ; 

MSab=SSab/((a-l)*(b-l)) ; 

MSac=SSac/((a-l)*(c-l)) ; 

MSbc=SSbc/((b-l)*(c-l)) ; 

DFe=(a*b*c  -  l)-(a-l)-(b-l)-(c-l)-((a-l)*(b-l))-((a-l)*(c-l))-((b-l)*(c-l)) ; 
MSe=SSe/DFe ; 

'/,  F  value 
Fa=MSa/MSe ; 

Fb=MSb/MSe ; 

Fc=MSc/MSe ; 

Fab=MSab/MSe ; 

Fac=MSac/MSe; 

Fbc=MSbc/MSe ; 

7,  display 


[1 

a-1 

SSa 

MSa 

Fa 

2 

b-1 

SSb 

MSb 

Fb 

3 

c-1 

SSc 

MSc 

Fc 

12 

(a-l)*(b-l) 

SSab 

MSab 

Fab 

13 

(a-l)*(c-l) 

SSac 

MSac 

Fac 

23 

(b-l)*(c-l) 

SSbc 

MSbc 

Fbc 

0 

DFe 

SSe 

MSe 

0 

0 

(a*b*c  -1) 

SSt 

0 

0] 

C.3  drc.m 

function  [range .minvalue .maxvalue] =drc (matrix) ; 

*/,  Dynamic  Range  Compression  (drc) 

'/,  return  range  between  0  and  1  of  input  matrix  columns 
*/,  Written  by  Capt  John  K.  Millhouse 

[m  n] =size (matrix) ; 

temp=matrix- (ones (m, l)*min (matrix) ) ; 
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range=temp . / (ones (m, 1) *max (temp) ) ; 
minvalue=min (matrix) ; 
maxvalue=max(temp) ; 

C.4  expmatl.m 

7.  Setup  for  experiment  one 

7,  Creates  a  script  file  called  ’run.experiment’ 

7,  Written  by  Capt  John  K.  Millhouse 

expaz= [10  32  45  58  69  82  98  111  123  135  148  169  191  212  225  237 
249  262  278  291  303  315  328  349  1000  3200  4500  5800  6900  8200 
9800  11100  12300  13500  14800  16900  19100  21200  22500  23700  24900 
26200  27800  29100  30300  31500  32800  34900] ; 

7.  randomly  place  sounds 
row=randperm(48) ; 

for  i=l:48, 

snd(i)=expaz(row(i)) ; 
temp=[’0’  int2str(snd(i))]  ; 
if  snd(i)<  100 

tsnd(i,l:3)=temp(l:3) ; 
elseif  snd(i)  <  1000 

tsnd(i,l:3)=temp(2:4) ; 
elseif  snd(i)  <  10000 
tsnd(i,l:3)=temp(l:3) ; 
else 

tsnd(i,l:3)=temp(2:4) ; 

end 

end 

fid  =  fopen( ’run.experiment ’ , ’w’) ; 
fprintf  (fid,  ’#!/bin/csh  7.s\n’ ,  ’  ’); 
fprintf (fid, ’makecompass  &7.s\n’,’  ’); 
fprintf  (f  id,  ’  introduction  7.s\n’ ,  ’  ’ ) ; 
fprintf  (fid, ’set  ans  =  $<  7.s\n’,’  ’); 


for  i=l:48, 

if  snd(i)<1000 

texl=[’s32cplay  -f 40000  .  ./actualfilters/rl’  int2str(snd(i) )  ’.d7.s\n’] 
tex2=[’s32cplay  -f40000  ../actualfilters/rl’  int2str(snd(i))  ’.d  ; 
echo  ’’Aactual  ’  tsnd(i,:)  ’’’  »  results. out  7.s\n’]  ; 

else 
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texl=[’s32cplay  -f 40000  .  .  /meanHRTF/rl  ’  int2str(snd(i)/100)  ’.d7.s\n’] 
tex2=[’s32cplay  -f40000  ../meanHRTF/rl’  int2str(snd(i)/100)  ’.d  ; 
echo  ’’Bactual  ’  tsnd(i,:)  ’’’  »  results. out  7,s\n’]  ; 

end 

fprintf (fid, texl , ’  ’); 
fprintf  (fid, ’sleep  1  7,s\n’,’  ’); 
fprintf (fid, tex2, ’  ’); 
fprintf  (fid, ’clear  7, s\n’ ,  ’  ’); 

fprintf  (fid, ’echo  ’’Select  location  and  press  return’’  7.s\n’,’  ’); 
fprintf  (fid, ’set  ans  =  $<  7.s\n’,’  ’); 
end 
end 

fprintf  (fid,  ’s32cplay  -f40000  thankyou.d  7.s\n’,’  ’); 
f close(fid) ; 

C.5  expmat2.m 

7.  Setup  for  experiment  two 

7.  Creates  a  script  file  called  ’run_experiment  ’ 

7.  Written  by  Capt  John  K.  Millhouse 

expaz=[10  32  45  58  69  82  98  111  123  135  148  169  191  212  225  237 
249  262  278  291  303  315  328  349  1000  3200  4500  5800  6900  8200 
9800  11100  12300  13500  14800  16900  19100  21200  22500  23700  24900 
26200  27800  29100  30300  31500  32800  34900] ; 

7.  randomly  place  sounds 
row=randperm(48) ; 

for  i=i:48, 

snd(i)=expaz(row(i)) ; 
temp=[’0’  int2str(snd(i))] ; 
if  snd(i)<  100 

tsnd(i , 1 :3)=temp(l :3) ; 
elseif  snd(i)  <  1000 

tsnd(i,l:3)=temp(2:4) ; 
elseif  snd(i)  <  10000 
tsnd(i, 1 :3)=temp(l :3) ; 
else 

tsnd(i,l:3)=temp(2:4) ; 

end 

end 
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fid  =  fopen( ’run_experiment ’ , ’w’ ) ; 
fprintf (fid, ’#!/bin/csh  7»s\n’,’  ’); 
fprintf (fid, ’makecompass  &7.s\n’,’  ’); 
fprintf  (fid,  introduction  7.s\n’ ,  ’  ’) ; 
fprintf  (fid,  ’  set  sms  =  $<  7.s\n’,’  ’); 


for  i=l :48, 

if  snd(i)<1000 

texi=[’echo  ’’Aactual  ’  tsnd(i,:)  ’’’  »  results. out; 

s32cplay  -f40000  . ./actualf ilters/rl ’  int2str (snd(i) )  id  7is\n’]  ; 
tex2=[’s32cplay  -f 40000  .  ./actualf ilters/rl ’  int2str (snd(i) )  ’.d7,s\n’]; 
else 

texl=[’echo  ’’Bactual  ’  tsnd(i,:)  ’’’  »  results. out; 

s32cplay  -f40000  .  ./netf  ilters/rl’  int2str(snd(i)/100)  ’.d7.s\n’]; 
tex2=[’s32cplay  -f 40000  .  ./netf  ilters/rl’  int2str(snd(i)/100)  ’.d*/,s\n’]; 

end 

fprintf (fid, texl , ’  ’); 
fprintf  (fid, ’sleep  1  '/,s\n’,’  ’); 
fprintf (fid, tex2, ’  ’); 
fprintf  (fid, ’clear  */,s\n’ ,  ’  ’); 

fprintf  (fid, ’echo  ’’Select  location  and  press  return’’  */,s\n’,’  ’); 
fprintf  (fid, ’set  ans  *  $<  7.s\n’,’  ’); 
end 
end 

fprintf  (fid, ’s32cplay  -f40000  thankyou.d  ’/,s\n’,’  ’); 
fclose(fid) ; 

C.6  idrc.m 

f met ion  [matrix]=idrc(range,minvalue,maxvalue) ; 

7,  inverse  Dynamic  Range  Compression  (idre) 

7,  return  matrix  to  original  values 
7.  Written  by  Capt  John  K.  Millhouse 

[m  n] =size (range) ; 

temp=range . * (ones (m, 1) *maxvalue) ; 
matrix=temp+(ones(m, l)*minvalue) ; 
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C.  7  LRsphere.m 

7,  Create  left  and  right  spheres  of  HRTF  data 

7,  Written  by  Capt  John  K.  Millhouse 

k=l; 

for  i=l:2:90, 
freq=i; 

[x  y  z  c]=makesphere(features,freq,16) ; 

subplot (1 ,2 , 1) ; 
surf (x,y,z,c) ; 
caxis( [0,  0.7915]); 
view(180 ,0) ; 
shading  interp 
axis  off 

title (num2str (features (i , 4) ) ) 
text (.1,1,0, ’(ft)  < —  Right  ear’) 
xlabel (’ right  side  of  head’); 

subplot (1,2, 2) ; 
surf (x,y,z,c) ; 
caxis(  [0,  0.7915] ) ; 
view(0,0) ; 
shading  interp 
axis  off 

text (-.1, -1,0, ’ (&)  < —  Left  ear’) 
text(-1.8,0,0, ’< —  Nose  — >’) 
xlabel (’left  side  of  head’); 
title( ’Hz’ ) ; 

colormap(jet) 

k=k+l ; 

end 

C.8  makesphere.m 

function  [x  ,y  ,z,  c]-  makesphere(features,freq,size) 

7.  freq  range  1  to  93 

7.  Create  sphere  with  color  corresponding  to  HRTF  value 

7.  Written  by  Capt  John  K.  Millhouse 

k=0; 

clear  az  elev  resp 
for  i=freq: 93: 25296, 
k=k+l ; 

az(k)=features(i,2)  ; 
elev(k)=features(i,3) ; 
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resp(k)=features(i,5) ; 
end 

c=mapsurf (az,elev,resp,size) ; 
[x  y  z] =sphere(size) ; 


C.  9  map  surf,  m 

function  [c]  =  mapsurf (az.elev, value, spheresize) 

’/,  maps  the  HRTF  value  to  the  az  and  elev 

7,  locations  on  a  sphere 

7,  Written  by  Capt  John  K.  Millhouse 

n=length(az) ; 

1=0; 

c  =  zeros (spheresize+1 , spheresize+1) ; 
v  =  c; 

for  I  =  l:n, 

a  =  fix(az(I)/360*(spheresize+l))+l; 

e  =  spheresize+1  -  fix((elev(I)+90)/180*(spheresize+l)) ; 
if  e>spheresize+l 
e=spheresize+l ; 

end 
if  e<l 
e=l; 

end 

if  a>spheresize+l 
a=spheresize+l ; 

end 
if  a<l 
a=l; 

end 

v(e ,a)=v(e ,a)+l ; 
c(e,a)=c(e,a)+value(I) ; 

end 

for  I  =  1 : spheresize+1 , 

for  J  =  1: spheresize+1, 
if  v(I, J)==0 
else 

c(I,J)  =  c(I, J)/v(I, J) ; 
end 

end 
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end 


C.10  mapeval.m 

function  [actual,  netout,  error] =mapeval(x,  y,  data); 

*/,  Written  by  Capt  John  K.  Millhouse 

*/,  Converts  LNKmap  output  to  Matlab  matrix  suitable  for 
7,  display 

7,  data  in  form  [number  actual  netout  error] 

nx=length(x) ; 
ny=length(y) ; 
actual=zeros(nx,ny) ; 
netout=zeros (nx , ny ) ; 
error=zeros(nx,ny) ; 
k=0 ; 

for  i=l:nx, 
for  j=l:ny, 
k=k+l ; 

actual (i,j)=data(k, 2) ; 
netout (i,j)=data(k, 3) ; 
error (i, j )=data(k,4) ; 

end 

end 

actual=actual’ ; 
netout=netout ’ ; 
error=error ’ ; 

C.ll  mypolar.m 

function  hpol  =  mypolar(theta,rho,line_style) 

7.P0LAR  Polar  coordinate  plot. 

7.  POLAR(THETA,  RHO)  makes  a  plot  using  polar  coordinates  of 

7.  the  angle  THETA,  in  radians,  versus  the  radius  RHO. 

7.  POLAR(THETA,RHO,S)  uses  the  linestyle  specified  in  string  S 

7.  See  PLOT  for  a  description  of  legal  linestyles. 

7. 

7.  See  also  PLOT,  LOGLOG,  SEMILOGX,  SEMILOGY . 

7.  Copyright  (c)  1984-93  by  The  MathWorks,  Inc. 

7.  Modified  by  Capt  John  K.  Millhouse  to  display 

7.  head  figure  and  test  locations. 

if  nargin  <  1 

error ( ’Requires  2  or  3  input  arguments.’) 
elseif  nargin  ==  2 
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if  isstr(rho) 

line_style  =  rho; 
rho  =  theta; 

[mr,nr]  =  size (rho); 
if  mr  ==  1 

theta  =  l:nr; 

else 

th  =  (1 :mr) ’ ; 

theta  =  th(: ,ones(l,nr)) ; 

end 

else 

line_style  =  ’auto'; 

end 

elseif  nargin  ==  1 

line_style  =  ’auto’; 
rho  =  theta; 

[mr,nr]  =  size(rho); 
if  mr  ==  1 

theta  =  i:nr; 

else 

th  =  (1 :mr) ’ ; 

theta  =  th( : ,ones(l ,nr) ) ; 

end 

end 

if  isstr (theta)  I  isstr(rho) 

error (’Input  arguments  must  be  numeric.’); 

end 

if  any (size (theta)  ~=  size (rho)) 

error (’THETA  and  RHO  must  be  the  same  size.’); 

end 

'/,  get  hold  state 
cax  =  newplot; 

next  =  lower(get(cax, ’NextPlot’)) ; 
hold_state  =  ishold; 

7,  get  x-axis  text  color  so  grid  is  in  same  color 
tc  =  get (cax, ’xcolor ’ ) ; 

7,  only  do  grids  if  hold  is  off 
if  "hold_state 

7.  make  a  radial  grid 
hold  on; 

hhh=plot([0  max(theta( : ))] , [0  max(abs(rho( :)))]) ; 
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v  =  [get(cax, ’xlim’)  get(cax, ’ylim’)] ; 
ticks  =  length(get(cax, ’ytick’)) ; 
delete (hhh) ; 

'/,  check  radial  limits  and  ticks 

rmin  =  0;  rmax  =  v(4) ;  rticks  =  ticks-1; 
if  rticks  >  5  '/,  see  if  we  can  reduce  the  number 

if  rem(rticks,2)  ==  0 

rticks  =  rticks/2; 
elseif  rem(rticks ,3)  ==  0 
rticks  =  rticks/3; 

end 

end 

'/,  define  a  circle 

th  =  0:pi/50:2*pi; 
xunit  =  cos(th); 
yunit  =  sin(th) ; 

7,  now  really  force  points  on  x/y  axes  to  lie  on  them  exactly 
inds  =  [1: (length(th)-l)/4:length(th)] ; 
xunits(inds(2:2:4))  =  zeros (2,1); 
yunits(inds(l:2:5))  =  zeros(3,l); 
ynose= [ . 24;  .3;  . 24]*rmax; 
xnose=[.05;  0;  -.05]*rmax; 
year=[.05;  .05;  -.05;  -.05]*rmax; 
xear=[.24;  .3;  .3;  .24]*rmax; 

rinc  =  (rmax-rmin) /rticks ; 
i=rmax ; 

plot(xunit*i,yunit*i, ’ :  ’ , ’color’ ,tc, ’linewidth’ ,1) ; 
i=rmax/4 ; 

plot(xunit*i,yunit*i, ’-’ , ’color’ ,tc, ’linewidth’ ,1) ; 
plot (xnose , ynose , , ’color’ ,tc, ’linewidth’ ,1) ; 
plot (xear, year, ’-’ , ’color’ ,tc, ’linewidth’ ,1) ; 
plot (-xear, year, ’-’ , ’color’ ,tc, ’linewidth’ ,1) ; 


7,  plot  spokes 

th  =  [10  32  45  58  69  82  98  111  123  135  148  169  191 

212  225  237  249  262  278  291  303  315  328  349]*pi/180; 
cst  =  -sin(th) ; 
snt  =  cos(th); 
cs  =  [zeros(l ,24) ;  cst]; 
sn  =  [zeros (1, 24) ;  snt]; 

7,plot(rmax*cs,rmax*sn,  ’ :  ’ ,  ’color’  ,tc,  ’linewidth’  ,1) ; 
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'/,  annotate  spokes  in  degrees 
rt  =  l.l*rmax; 
for  i  =  i :max(size(th)) 

text(rt*cst(i) ,rt*snt(i) ,int2str(th(i)*180/pi) , 
’horizontalalignment ’ , ’center’) 

end 

*/,  set  viewto  2-D 

view(0,90) ; 

'/,  set  axis  limits 

axis(rmax* [-1  1  -1.1  1.1]); 

end 

'/,  transform  data  to  Cartesian  coordinates, 
xx  =  rho. * (-sin (theta) ) ; 
yy  =  rho. *cos (theta) ; 

'/,  plot  data  on  top  of  grid 
if  strcmp(line_style, ’auto’) 
q  =  plot (xx,yy) ; 

else 

q  =  plot(xx,yy,line_style) ; 

end 

if  nargout  >  0 

hpol  =  q; 

end 

if  “hold_state 

axis ( ’equal’ ) ; axis (’off ’) ; 

end 

'/,  reset  hold  state 

if  “hold_state,  set (cax, ’NextPlot’ , next) ;  end 
C.12  plotresult.m 

function  [h]=plotresult (result, rquestion) ; 

'/,  plot  results  of  a  subjects  response 
'/,  Written  by  Capt  John  K.  Millhouse 
if  nargin  <  2 

rquestion  =  ’nor’; 

end 

rhoslO=ones(l ,360) ; 
rhos20=ones(l ,360) ; 
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for  i=l:length(result) , 
thetal=result (i , 2) *pi/ 180 ; 
theta2= (result (i , 3) ) *pi/180 ; 
lt=’yo’; 

if  rquestion==’rev’ , 

if  abs (result (i ,4) ) >abs (result (i , 6) ) 
theta2= (result (i , 5) ) *pi/ 180 ; 
lt=’r+’ ; 

end 

end 

index=round(theta2*180/pi)+l ; 
if  result (i,l)==10 

rhos 10 (index) =rhos 10 (index) - . 05 ; 
rhol0=rhos 10 (index) ; 
subplot (1,2,1) 
mypolar(thetal,l, ’gx’) 
hold  on 

mypolar(theta2, rhol0.lt) 
mypolar( [thetal ;  theta2] , [1 ;  rholO] , ’g-’ ) 
title (’With  HRTF ’ ) 
else 

rhos20 (index) =rhos20 (index) - . 05 ; 
rho20=rhos20 (index) ; 
subplot (1,2, 2) 
mypolar (thetal , 1 , ’ mx ’ ) 
hold  on 

mypolar (that a2 , rho20 , It ) 

mypolar ( [thetal ;  theta2],[l;  rho20],’m-’) 

title (’Without  HRTF’) 

end 

end 

for  ±-1:2 
subplot (1, 2, i) 
hold  off 
end 


C.13  plothist.m 

function  [correct lOtheta, correct20theta]  = 

plothist (test all, test 10, test20,rquestion,correctdeg) ; 
'/,  Written  by  Capt  John  K.  Millhouse 
7,  Plots  Polar  histogram  of  data  of  correct  responses 
7,  testall  is  all  test  responses 


131 


'/,  testlO  is  test  responses  for  filteri 

*/,  test20  is  test  responses  for  filter2 

'/,  rquestion  is  ’rev’  reverses  or  ’nor’  no  reverses 

'/,  correctdeg  is  the  number  of  degrees  off  an  still  correct 


if  nargin<5 

correctdeg=23; 

end 


mmmmm mmmmmmmxxxmm  plot  histograms 


actualtheta=testall( : ,2)*pi/180; 
testtheta=testall( : ,3)*pi/180; 

testwrevtheta=[testlO(: ,3)*pi/180;  test20(: ,3)*pi/180] ; 
testlOrevtheta=[testlO(: ,3)*pi/180]  ; 
test20revtheta=[test20(: ,3)*pi/180] ; 
testlOtheta=testall(l : length(testlO) ,3)*pi/180; 
test20theta=testall(length(testl0)+l : length (test all) ,3)*pi/180; 
for  i=l : length(testlO) , 

if  abs(testl0(i,4))<correctdeg, 

correct 10theta= [correct lOtheta;  test!0(i,2)*pi/180] ; 


end 


end 

for  i=l:length(test20) , 

if  abs(test20(i,4))<correctdeg, 

correct20theta=[correct20theta;  test20(i,2)*pi/180] ; 

end 

end 

bins=180; 

'/,  number  of  bins  in  the  histogram 

'/.  if  the  number  of  bins  is  >90  then  offset  the  bins  for  display 
if  bins>90 

correct 10theta=correctl0theta+ (pi/90) ; 
correct20theta=correct20theta- (pi/90) ; 
test 10theta=testl0theta+ (pi/90) ; 
test20theta=test20theta- (pi/90) ; 

end 

[t  act , ract] =rose ( actualthet  a , bins ) ; 

[tclO , rclO] =rose (correct lOthet a, bins) ; 

[tc20 , rc20] =rose (correct20theta, bins) ; 
[tlO,rlO]=rose(testlOtheta,bins) ; 

[t20,r20] =rose(test20theta,bins) ; 


figure 

mypolar(tact,ract/2, ’w: ’) 
hold  on 

mypolar(tlO,rlO, ’w-. ’) 


132 


mypolar (t20 , r20 , ’ w- ’ ) 

title ( ['Histogram  of  all  responses']) 

legend ( ’ w : ’ , ’ Actual ’ , ’ w- . ' , ’ with  HRTF ’ , ’ w- ’ , ' without  HRTF ' ) 

hold  off 

figure 

mypolar (tact , r act/2 , ’ w : ' ) 
hold  on 

mypolar(tclO,rclO, 'w-. ’) 
mypolar (tc20,rc20, 'w-') 

title ( [’Histogram  of  correct  responses  ’  ’('  rquestion  ’)’]) 

legend(’w: ', 'Actual' , 'w-. ', 'with  HRTF  correct' , 'w-' , 'without  HRTF  correct') 

hold  off 

C.14  readall.m 

function  [rmserr , stderr , test all , revall , test  10 , test20]  = 
readall(rquestion, subjects) j 


xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

*/.  Read  all  the  test  subjects  data  and  save  to  a  7, 
7.  file  named  allsubjects  7* 

7.  by  J.  Millhouse  7. 

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 


7.folder=[’meanHRTFresults’]  ; 
folder=[' results’] ; 


if  nargin  <  1 

rquestion  =  'nor'; 

end 

if  nargin<2 

if  length(f older) ==15 

7.  meanHRTFresults  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

subjects=[’bsmithl .out ’ , ’bsmith2.out’ , ’ jmillhousel.out’ , . . . 

’ jmillhouse2 . out ’ , ' bmcql . out ' , ’ bmcq2 . out ’ , . . . 
'ceisenbiesl.out' , 'ceisenbies2.out' , 'gharrupl.out' , . . . 
’ Imyersl . out ’ , ’ lmyers2 . out ’ , ’ rdenneryl . out ’ , . . . 

’ s stewart 1. out' , 'twilsonl .out’ , ’twilson2.out’ , . . . 
'amillhousel .out ’ , 'bobl.out' , ’ jrmillhousel . out ’ , . . . 
'djenningsl .out ’ , 'r jreidl . out ’] ; 

else 

X  results  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

subjects=[’ksmithl .out ’ , 'rdenneryl .out ’ , 'jmillhousel.out' , . . . 
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’ jmillhouse2 . out ’ , ’bsmithl .out ‘ , ’twilsonl .out ’  , .  .  . 
’gharrupl .out ’ , ’ jrmillhousel . out ’ , ’ jrmillhouse2.out, 
1 lmyers2 . out ’ , ’ lmyers3 . out ’ , ’ ddouglasl . out ’ , .  . . 
’ddouglas2 .out * , ’rmillhousel.out’ , ,rmillhouse2 .out ’ , 
’ ceisenbiesl .out ’ , ’ ceisenbies2.out’ , ’djenningsl .out’ 
’srogersl.out’ , ’amillhousei .out’] ; 

end 


end 

testall=  []  ; 
testall2=[]  ; 
revall=  []  ; 
start=l ; 
finish=l; 

for  i=4: length (subjects) 
j=i-3; 

if  subjects(j :i)==J .out’ 
finish=i; 

name=subjects (start :f inish-4) 

[test  rev] =readresult ( [folder  ’/’  name  ’ 
'/.figure 

*/,plotresult(test  .rquestion)  ; 
start=f inish+1 ; 
testall=[testall;  test]; 
revall=[revall;  rev]; 


eval([name  ,=zeros(24,3) ; ’] ) ; 
f =sortdata(test ( : , 1 : 3) , 2) ; 
c=0; 

oldaz=0 ; 

for  a=l:length(f) 
if  f (a, 1)==10, 
b=2 ; 
else 


.  out ’ ] ) ; 


make  output  format 


b=3 ; 


end 

az=f (a, 2) ; 

if  oldaz<az  c=c+l;  end 
oldaz=az ; 

eval([name  * (c,l)=f (a, 2) ; ’] ) ; 
eval([name  ’ (c ,b)=f (a, 3) ; ’] ) ; 

end 

eval( [’testall2=[testall2;  ’  name  ’];’]); 


•/  •/  •/  •> 


'.7.47.7.7. 7.7.  7. 7.7. 7.7, 7.7.7.7.47.7.7.7.47. 


end 
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end 

eval([’save  ’  folder  ’/allsubjects  testall2  -ascii  -tabs’]) 
testall=sortdata(testall, 1) ; 
revall=sum(revall) ; 


testlO=[] ; 
test20=[]  ; 


test=testall ; 

for  i=l :length(testall) 


if  rquestion==’rev’ , 
if  abs(testall(i,4))>abs(testall(i,6)) 
test(i,4)=test(i,6) ; 
test(i,6)=test(i,3) ; 
test(i,3)=test(i,5) ; 
test (i ,5)=test (i ,6) ; 
test(i,6)=999; 

end 

end 


if  testall(i,l)==10 

testlO=[testlO;  test(i,:)]; 
else 

test20=[test20;  test(i,:)]; 

end 

end 


get  reversals 


split  into  two 


avgerr= [mean (test 10 (: ,4))  mean (test 20 ( : ,4))] ; 
rmserr=sqrt (avgerr . *2) ; 

stderr=[std(testlO( : ,4))  std(test20( : ,4))] ; 


mmmmmmxxxmmmmmmmm  plot  histograms 

'/.plothist (testall,testl0,test20,rquestion)  ; 


C.15  removerev.m 

function  [y,z]=removerev(x) ; 

’/.  Written  by  Capt  John  K.  Millhouse 

[m,n]=size(x) ; 

y-D; 

z=  []  ; 

for  i=l:m, 

if  x(i,n)<999, 
y=[y;  x(i, :)] ; 
else 
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z=[z;  x(i , :)]  ; 

end 

end 


C.16  sortdata.m 

function  [  Y  ]  =  sortdata(f ilel ,  column) 

[n,m]=size(filel) ; 

[X,I]=sort(filei)  ; 
for  j=l:n, 
for  k=l :m, 

if  k==column 

Y(j ,k)=X(j .column) ; 
else 

Y(j ,k)=filel(I(j .column) ,k) ; 

end 

end 

end 
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